mirage

GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3

WakeSpace Repository

Show simple item record

dc.contributor.author Li, Xuansong
dc.contributor.author Grimes, Stephen
dc.contributor.author Strassel, Stephanie
dc.date.accessioned 2016-11-28T15:49:38Z
dc.date.available 2016-11-28T15:49:38Z
dc.date.issued 2015-02-16
dc.identifier.citation Li, Xuansong, Stephen Grimes, and Stephanie Strassel. GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 LDC2015T04. Web Download. Philadelphia: Linguistic Data Consortium, 2015. en_US
dc.identifier.isbn 1-58563-705-X
dc.identifier.uri http://hdl.handle.net/10339/63122
dc.description.abstract GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 was developed by the Linguistic Data Consortium (LDC) and contains 242,020 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program. Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation. Other releases available in this series are: GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web (LDC2012T16) GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire (LDC2012T20) GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web (LDC2012T24) GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web (LDC2013T05) GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1 (LDC2013T23) GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2 (LDC2014T25) en_US
dc.publisher Linguistic Data Consortium en_US
dc.title GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 en_US
dc.type Dataset en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record