mirage

Browsing Linguistics Datasets by Issue Date

WakeSpace Repository

Browsing Linguistics Datasets by Issue Date

Sort by: Order: Results:

  • Linguistic Data Consortium (Linguistic Data Consortium, 1994)
    The first release of the European Corpus Initiative, the Multilingual Corpus 1 (ECI/MCI), has 46 subcorpora in 27 (mainly European) languages. The total size of these is roughly 92 million (lexical) words. The corpora are ...
  • Unknown author (2012-10-26)
  • Unknown author (2012-11-06)
    Digital Archive of Southern Speech (DASS), Linguistic Data Consortium (LDC) catalog number LDC2012S03 and ISBN 1-58563-603-7, was developed by the University of Georgia. It is a subset of the Linguistic Atlas of the Gulf ...
  • Unknown author (2013)
  • Unknown author (2013)
  • Doran, Christine; Burger, John; Henderson, John; Zarrella, Guido (Linguistic Data Consortium, 2013-01-15)
    Chinese-English Biology and Chemistry Abstract Parallel Text was developed by The MITRE Corporation. It consists of parallel sentences from a collection of chemistry and biology-related scientific article abstracts published ...
  • Unknown author (2013-03-27)
  • Unknown author (2013-12-18)
  • Unknown author (2014)
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2014-03-17)
    GALE Phase 2 Chinese Broadcast News Parallel Text Part 1 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-06-16)
    GALE Phase 2 Chinese Broadcast News Parallel Text Part 2 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-07-15)
    GALE Phase 2 Chinese Newswire Parallel Text Part 1 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-09-15)
    GALE Phase 2 Chinese Newswire Parallel Text Part 2 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2014-11-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2 was developed by the Linguistic Data Consortium (LDC) and contains 65,069 tokens of word aligned Chinese and English parallel text enriched with ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-11-15)
    GALE Phase 2 Chinese Web Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE (Global ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-02-16)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 was developed by the Linguistic Data Consortium (LDC) and contains 242,020 tokens of word aligned Chinese and English parallel text enriched with ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie; Ma, Xiaoyi; Xue, Nianwen; Marcus, Mitch; Taylor, Ann (Linguistic Data Consortium, 2015-03-16)
    GALE Chinese-English Parallel Aligned Treebank -- Training was developed by the Linguistic Data Consortium (LDC) and contains 229,249 tokens of word aligned Chinese and English parallel text with treebank annotations. This ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-06-15)
    GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the ...
  • Bies, Ann; Mott, Justin; Warner, Colin (Linguistic Data Consortium, 2015-07-15)
    English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the ...