mirage

Linguistics Datasets

WakeSpace Repository

Linguistics Datasets

 

Recent Submissions

  • Unknown author (2020-02-27)
  • Tracey, Jennifer; Lee, Haejoong; Strassel, Stephanie (2019-03-27)
    BOLT English Discussion Forums was developed by the Linguistic Data Consortium (LDC) and consists of 830,440 discussion forum threads in English harvested from the Internet using a combination of manual and automatic ...
  • Song, Zhiyi; Fore, Dana; Strassel,Stephanie; Lee, Haejoong; Wright, Jonathan (2019-03-27)
    BOLT English SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of naturally-occurring Short Message Service (SMS) and Chat (CHT) data collected through data donations and live collection involving ...
  • Taulé, Mariona; Martí, Maria Antonia; Bies, Ann; Gari, Aina; Nofre, Montserrat; Song, Zhiyi; Strassel, Stephanie; Ellis, Joe (Linguistic Data Consortium, 2018-01-16)
    DEFT Spanish Treebank was developed by the Linguistic Data Consortium (LDC) and the Language and Computation Center (CLiC), University of Barcelona. It contains treebank annotation of international Spanish newswire text ...
  • Unknown author (2018-01-03)