Abstract:
GALE Phase 4 Chinese Weblog Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploitation) Program. This corpus contains Chinese source sentences and corresponding English translations selected from newsgroup and weblog data collected by LDC and translated by LDC or under its direction.