mirage

Linguistics Datasets

WakeSpace Repository

Linguistics Datasets

 

Recent Submissions

  • Unknown author (2017-04-04)
  • Bies, Ann; Mott, Justin; Warner, Colin (Linguistic Data Consortium, 2015-07-15)
    English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-09-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4 was developed by the Linguistic Data Consortium (LDC) and contains 243,038 tokens of word aligned Chinese and English parallel text enriched with ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-01-18)
    GALE Phase 4 Chinese Weblog Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-11-15)
    GALE Phase 3 and 4 Chinese Newswire Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA ...

Search WakeSpace


Advanced Search

Browse

My Account

Statistics

RSS Feeds