mirage

Linguistics Datasets

WakeSpace Repository

Linguistics Datasets

 

Recent Submissions

  • Taulé, Mariona; Martí, Maria Antonia; Bies, Ann; Gari, Aina; Nofre, Montserrat; Song, Zhiyi; Strassel, Stephanie; Ellis, Joe (Linguistic Data Consortium, 2018-01-16)
    DEFT Spanish Treebank was developed by the Linguistic Data Consortium (LDC) and the Language and Computation Center (CLiC), University of Barcelona. It contains treebank annotation of international Spanish newswire text ...
  • Unknown author (2018-01-03)
  • Unknown author (2017-04-04)
  • Bies, Ann; Mott, Justin; Warner, Colin (Linguistic Data Consortium, 2015-07-15)
    English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-09-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4 was developed by the Linguistic Data Consortium (LDC) and contains 243,038 tokens of word aligned Chinese and English parallel text enriched with ...