Resources

Accessing the corpus

The annotated resources are distributed HERE in two formats: a TEI-compliant XML format (.xml files) and an Analec format (.ec files).

The .ec files can be browsed and searched trough the Analec tool. A quick guide to using Analec for browsing and searching the Modal Corpus is available here.

Description of the  corpus

The excerpts  were selected from the Santa Barbara Corpus of Spoken American English (for English),the ESLO Corpus, plus the OTG Corpus and the AccueilUBS Corpus (for French),  the VoLip Corpus (for Italian) annotated for epistemic modality with the aim of obtaining three resources balanced in terms of size (about 20.000 words per languages) and communication context (public vs. private vs. broadcasting vs. family context; young and adults participants; free and directed turn-taking, two or more than two participants).

Public context, directed turn-taking, adult participants
Italian   MODAL- IT01,LIPRC10, 2149 words
English   MODAL- EN01,SBC039, 
French    MODAL- FR01,ESLO2SOUTENANCE, 2604 words
Public context, free turn-taking, adult participants
Italian   MODAL- IT02,LIPRA7, 2770 words
English   MODAL- EN02,SBC008
French    MODAL- FR02a1,ESLO24H_apresmiditravail
          MODAL- FR02a2,ESLO24H_apresmiditravail
          MODAL- FR02b,OTG1AP0316
          MODAL- FR02c,UBS028
Broadcasting, free turn-taking, adult participants
Italian   MODAL- IT03,LIPRE11, 4019 words
English   MODAL- EN03,SBC053
French    MODAL- FR03,ESLO2MEDIA13
Private context, free turn-taking, young participants
Italian   MODAL- IT04,LIPRA1, 3329 words
English   MODAL- EN04,SBC007
French    MODAL- FR04,ESLO224DEBUTJOURNEE
Family context, free turn-taking, young and adult participants
Italian   MODAL- IT05,LIPFA1, 2000 words
English   MODAL- EN05,SBC019
French    MODAL- FR05,ESLO2REP0102
Private context, free turn-taking, adult participants
Italian   MODAL- IT06,LIPRA3, 5398 words
English   MODAL- EN06,SBC002
French    MODAL- FR06,ESLO2REP22

Licence

The Modal Corpus is published under the terms of the Creative Commons Attribution-NonCommercial ShareAlike 4.0 International licence.

Please, follow these instructions when using data drawn from the Modal Corpus.