dc.contributor.author |
Tracey, Jennifer |
|
dc.contributor.author |
Lee, Haejoong |
|
dc.contributor.author |
Strassel, Stephanie |
|
dc.date.accessioned |
2019-03-27T15:13:26Z |
|
dc.date.available |
2019-03-27T15:13:26Z |
|
dc.date.issued |
2019-03-27 |
|
dc.date.issued |
2017-07-18 |
|
dc.identifier.citation |
Tracey, Jennifer, Haejoong Lee, and Stephanie Strassel. BOLT English Discussion Forums LDC2017T11. Web Download. Philadelphia: Linguistic Data Consortium, 2017. |
en_US |
dc.identifier.isbn |
1-58563-806-4 |
|
dc.identifier.uri |
http://hdl.handle.net/10339/93642 |
|
dc.description.abstract |
BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic processes.
The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. The material in this release represents the unannotated English source data in the discussion forum genre. |
en_US |
dc.title |
BOLT English Discussion Forums |
en_US |