mirage

BOLT English Discussion Forums

WakeSpace Repository

Show simple item record

dc.contributor.author Tracey, Jennifer
dc.contributor.author Lee, Haejoong
dc.contributor.author Strassel, Stephanie
dc.date.accessioned 2019-03-27T15:13:26Z
dc.date.available 2019-03-27T15:13:26Z
dc.date.issued 2019-03-27
dc.date.issued 2017-07-18
dc.identifier.citation Tracey, Jennifer, Haejoong Lee, and Stephanie Strassel. BOLT English Discussion Forums LDC2017T11. Web Download. Philadelphia: Linguistic Data Consortium, 2017. en_US
dc.identifier.isbn 1-58563-806-4
dc.identifier.uri http://hdl.handle.net/10339/93642
dc.description.abstract BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic processes. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. The material in this release represents the unannotated English source data in the discussion forum genre. en_US
dc.title BOLT English Discussion Forums en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record