mirage

GALE Chinese-English Parallel Aligned Treebank -- Training

WakeSpace Repository

Show simple item record

dc.contributor.author Li, Xuansong
dc.contributor.author Grimes, Stephen
dc.contributor.author Strassel, Stephanie
dc.contributor.author Ma, Xiaoyi
dc.contributor.author Xue, Nianwen
dc.contributor.author Marcus, Mitch
dc.contributor.author Taylor, Ann
dc.date.accessioned 2016-11-28T15:46:50Z
dc.date.available 2016-11-28T15:46:50Z
dc.date.issued 2015-03-16
dc.identifier.citation Li, Xuansong, et al. GALE Chinese-English Parallel Aligned Treebank -- Training LDC2015T06. Web Download. Philadelphia: Linguistic Data Consortium, 2015. en_US
dc.identifier.isbn 1-58563-708-4
dc.identifier.uri http://hdl.handle.net/10339/63121
dc.description.abstract GALE Chinese-English Parallel Aligned Treebank -- Training was developed by the Linguistic Data Consortium (LDC) and contains 229,249 tokens of word aligned Chinese and English parallel text with treebank annotations. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program. Parallel aligned treebanks are treebanks annotated with morphological and syntactic structures aligned at the sentence level and the sub-sentence level. Such data sets are useful for natural language processing and related fields, including automatic word alignment system training and evaluation, transfer-rule extraction, word sense disambiguation, translation lexicon extraction and cultural heritage and cross-linguistic studies. With respect to machine translation system development, parallel aligned treebanks may improve system performance with enhanced syntactic parsers, better rules and knowledge about language pairs and reduced word error rate. The Chinese source data was translated into English. Chinese and English treebank annotations were performed independently. The parallel texts were then word aligned. The material in this release corresponds to portions of the Chinese treebanked data in Chinese Treebank 6.0 (LDC2007T36) (CTB), OntoNotes 3.0 (LDC2009T24) and OntoNotes 4.0 (LDC2011T03). en_US
dc.publisher Linguistic Data Consortium en_US
dc.title GALE Chinese-English Parallel Aligned Treebank -- Training en_US
dc.type Dataset en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record