mirage

Browsing Linguistics Datasets by Submit Date

WakeSpace Repository

Browsing Linguistics Datasets by Submit Date

Sort by: Order: Results:

  • Unknown author (2012-10-26)
  • Unknown author (2013-03-27)
  • Unknown author (2013-12-18)
  • Unknown author (2013)
  • Unknown author (2013)
  • Unknown author (2014)
  • Unknown author (2015-08-31)
  • Sauri, Roser; Domingo, Judith; Badia, Toni (2015-09-16)
  • CHM150 
    Mena, Carlos; Herrera, Abel (Linguistic Data Consortium, 2016-06-15)
    CHM150 (Corpus Hecho en México 150) was developed by the Speech Processing Laboratory of the Faculty of Engineering at the National Autonomous University of Mexico (UNAM) and consists of approximately 1.63 hours of Mexican ...
  • Linguistic Data Consortium (Linguistic Data Consortium, 1994)
    The first release of the European Corpus Initiative, the Multilingual Corpus 1 (ECI/MCI), has 46 subcorpora in 27 (mainly European) languages. The total size of these is roughly 92 million (lexical) words. The corpora are ...
  • Doran, Christine; Burger, John; Henderson, John; Zarrella, Guido (Linguistic Data Consortium, 2013-01-15)
    Chinese-English Biology and Chemistry Abstract Parallel Text was developed by The MITRE Corporation. It consists of parallel sentences from a collection of chemistry and biology-related scientific article abstracts published ...
  • Muir, Kate; Joinson, Adam; Cotterill, Rachel; Dewdney, Nigel (Linguistic Data Consortium, 2016-07-15)
    English Speed Networking Conversational Transcripts was developed at the University of the West of England and contains 388 transcripts of English face-to-face and instant messaging conversations about business ideas ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie; Ma, Xiaoyi; Xue, Nianwen; Marcus, Mitch; Taylor, Ann (Linguistic Data Consortium, 2015-03-16)
    GALE Chinese-English Parallel Aligned Treebank -- Training was developed by the Linguistic Data Consortium (LDC) and contains 229,249 tokens of word aligned Chinese and English parallel text with treebank annotations. This ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-02-16)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 was developed by the Linguistic Data Consortium (LDC) and contains 242,020 tokens of word aligned Chinese and English parallel text enriched with ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-03-15)
    GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-06-15)
    GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-10-15)
    GALE Phase 4 Chinese Broadcast News Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-11-16)
    GALE Phase 4 Chinese Newswire Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2014-11-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2 was developed by the Linguistic Data Consortium (LDC) and contains 65,069 tokens of word aligned Chinese and English parallel text enriched with ...