Abstract:
GALE Phase 2 Chinese Newswire Parallel Text Part 2 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE (Global Autonomous Language Exploitation) Program. This corpus contains 117,895 tokens of Chinese source text and corresponding English translations selected from newswire data collected by LDC in 2007 and translated by LDC or under its direction.
LDC has also released GALE Phase 2 Chinese Newswire Parallel Text Part 1 (LDC2014T15).