mirage

Browsing Linguistics Datasets by Title

WakeSpace Repository

Browsing Linguistics Datasets by Title

Sort by: Order: Results:

  • Taulé, Mariona; Martí, Maria Antonia; Bies, Ann; Gari, Aina; Nofre, Montserrat; Song, Zhiyi; Strassel, Stephanie; Ellis, Joe (Linguistic Data Consortium, 2018-01-16)
    DEFT Spanish Treebank was developed by the Linguistic Data Consortium (LDC) and the Language and Computation Center (CLiC), University of Barcelona. It contains treebank annotation of international Spanish newswire text ...
  • Unknown author (2012-11-06)
    Digital Archive of Southern Speech (DASS), Linguistic Data Consortium (LDC) catalog number LDC2012S03 and ISBN 1-58563-603-7, was developed by the University of Georgia. It is a subset of the Linguistic Atlas of the Gulf ...
  • Linguistic Data Consortium (Linguistic Data Consortium, 1994)
    The first release of the European Corpus Initiative, the Multilingual Corpus 1 (ECI/MCI), has 46 subcorpora in 27 (mainly European) languages. The total size of these is roughly 92 million (lexical) words. The corpora are ...
  • Bies, Ann; Mott, Justin; Warner, Colin (Linguistic Data Consortium, 2015-07-15)
    English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the ...
  • Muir, Kate; Joinson, Adam; Cotterill, Rachel; Dewdney, Nigel (Linguistic Data Consortium, 2016-07-15)
    English Speed Networking Conversational Transcripts was developed at the University of the West of England and contains 388 transcripts of English face-to-face and instant messaging conversations about business ideas ...
  • Unknown author (2021-12-13)
  • Unknown author (2013-03-27)
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie; Ma, Xiaoyi; Xue, Nianwen; Marcus, Mitch; Taylor, Ann (Linguistic Data Consortium, 2015-03-16)
    GALE Chinese-English Parallel Aligned Treebank -- Training was developed by the Linguistic Data Consortium (LDC) and contains 229,249 tokens of word aligned Chinese and English parallel text with treebank annotations. This ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2014-11-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2 was developed by the Linguistic Data Consortium (LDC) and contains 65,069 tokens of word aligned Chinese and English parallel text enriched with ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-02-16)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 was developed by the Linguistic Data Consortium (LDC) and contains 242,020 tokens of word aligned Chinese and English parallel text enriched with ...
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-09-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4 was developed by the Linguistic Data Consortium (LDC) and contains 243,038 tokens of word aligned Chinese and English parallel text enriched with ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2014-03-17)
    GALE Phase 2 Chinese Broadcast News Parallel Text Part 1 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-06-16)
    GALE Phase 2 Chinese Broadcast News Parallel Text Part 2 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-07-15)
    GALE Phase 2 Chinese Newswire Parallel Text Part 1 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-09-15)
    GALE Phase 2 Chinese Newswire Parallel Text Part 2 was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE ...
  • Friedman, Lauren; Jin, Hubert; Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2014-11-15)
    GALE Phase 2 Chinese Web Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 2 of the DARPA GALE (Global ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-03-15)
    GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-07-15)
    GALE Phase 3 and 4 Chinese Broadcast News Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-11-15)
    GALE Phase 3 and 4 Chinese Newswire Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-06-15)
    GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the ...