mirage

Browsing Linguistics Datasets by Issue Date

WakeSpace Repository

Browsing Linguistics Datasets by Issue Date

Sort by: Order: Results:

  • Unknown author (2015-08-31)
  • Li, Xuansong; Grimes, Stephen; Strassel, Stephanie (Linguistic Data Consortium, 2015-09-15)
    GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4 was developed by the Linguistic Data Consortium (LDC) and contains 243,038 tokens of word aligned Chinese and English parallel text enriched with ...
  • Sauri, Roser; Domingo, Judith; Badia, Toni (2015-09-16)
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-10-15)
    GALE Phase 4 Chinese Broadcast News Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2015-11-16)
    GALE Phase 4 Chinese Newswire Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-01-18)
    GALE Phase 4 Chinese Weblog Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-03-15)
    GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 ...
  • CHM150 
    Mena, Carlos; Herrera, Abel (Linguistic Data Consortium, 2016-06-15)
    CHM150 (Corpus Hecho en México 150) was developed by the Speech Processing Laboratory of the Faculty of Engineering at the National Autonomous University of Mexico (UNAM) and consists of approximately 1.63 hours of Mexican ...
  • Muir, Kate; Joinson, Adam; Cotterill, Rachel; Dewdney, Nigel (Linguistic Data Consortium, 2016-07-15)
    English Speed Networking Conversational Transcripts was developed at the University of the West of England and contains 388 transcripts of English face-to-face and instant messaging conversations about business ideas ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-07-15)
    GALE Phase 3 and 4 Chinese Broadcast News Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the ...
  • Song, Zhiyi; Krug, Gary; Strassel, Stephanie (Linguistic Data Consortium, 2016-11-15)
    GALE Phase 3 and 4 Chinese Newswire Parallel Text was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phases 3 and 4 of the DARPA ...
  • Unknown author (2017-04-04)
  • Unknown author (2018-01-03)
  • Taulé, Mariona; Martí, Maria Antonia; Bies, Ann; Gari, Aina; Nofre, Montserrat; Song, Zhiyi; Strassel, Stephanie; Ellis, Joe (Linguistic Data Consortium, 2018-01-16)
    DEFT Spanish Treebank was developed by the Linguistic Data Consortium (LDC) and the Language and Computation Center (CLiC), University of Barcelona. It contains treebank annotation of international Spanish newswire text ...
  • Song, Zhiyi; Fore, Dana; Strassel,Stephanie; Lee, Haejoong; Wright, Jonathan (2019-03-27)
    BOLT English SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of naturally-occurring Short Message Service (SMS) and Chat (CHT) data collected through data donations and live collection involving ...
  • Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie (2019-03-27)
    BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic ...
  • Unknown author (2020-02-27)
  • Unknown author (2021-12-13)
  • Unknown author (Slator, 2022)
    The Slator Machine Translation Expert-in-theLoop Report provides a comprehensive view on the interaction between human experts and machines in translation production. In this report, we investigate the different ways that ...