dc.contributor.author |
Li, Xuansong |
|
dc.contributor.author |
Grimes, Stephen |
|
dc.contributor.author |
Strassel, Stephanie |
|
dc.contributor.author |
Ma, Xiaoyi |
|
dc.contributor.author |
Xue, Nianwen |
|
dc.contributor.author |
Marcus, Mitch |
|
dc.contributor.author |
Taylor, Ann |
|
dc.date.accessioned |
2016-11-28T15:46:50Z |
|
dc.date.available |
2016-11-28T15:46:50Z |
|
dc.date.issued |
2015-03-16 |
|
dc.identifier.citation |
Li, Xuansong, et al. GALE Chinese-English Parallel Aligned Treebank -- Training LDC2015T06. Web Download. Philadelphia: Linguistic Data Consortium, 2015. |
en_US |
dc.identifier.isbn |
1-58563-708-4 |
|
dc.identifier.uri |
http://hdl.handle.net/10339/63121 |
|
dc.description.abstract |
GALE Chinese-English Parallel Aligned Treebank -- Training was developed by the Linguistic Data Consortium (LDC) and contains 229,249 tokens of word aligned Chinese and English parallel text with treebank annotations. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program.
Parallel aligned treebanks are treebanks annotated with morphological and syntactic structures aligned at the sentence level and the sub-sentence level. Such data sets are useful for natural language processing and related fields, including automatic word alignment system training and evaluation, transfer-rule extraction, word sense disambiguation, translation lexicon extraction and cultural heritage and cross-linguistic studies. With respect to machine translation system development, parallel aligned treebanks may improve system performance with enhanced syntactic parsers, better rules and knowledge about language pairs and reduced word error rate.
The Chinese source data was translated into English. Chinese and English treebank annotations were performed independently. The parallel texts were then word aligned. The material in this release corresponds to portions of the Chinese treebanked data in Chinese Treebank 6.0 (LDC2007T36) (CTB), OntoNotes 3.0 (LDC2009T24) and OntoNotes 4.0 (LDC2011T03). |
en_US |
dc.publisher |
Linguistic Data Consortium |
en_US |
dc.title |
GALE Chinese-English Parallel Aligned Treebank -- Training |
en_US |
dc.type |
Dataset |
en_US |