The Corpus

The Modal Corpus consists of three equivalent resources of English, French and Italian dialogues drawn, respectively, from the Santa Barbara Corpus of Spoken American English (for English),the ESLO Corpus, plus the OTG Corpus and the Accueil UBS Corpus (for French),  the VoLip Corpus (for Italian) annotated for epistemic modality.

We annotated about 20.000 words per language for a total amount of 2824 epistemic constructions (833 for the English Corpus,  1271 for the French Corpus, 720 for the Italian Corpus).

Theoretical Underpinnings

In order to annotate epistemic modality in dialogues we subscribed to a communitarian (Stalnaker 1978), dynamic (Groenendijk & Stokhof, 1991) and interactionist (Ginzburg 2012) approach to semantics, which led us to refine the traditional definition of epistemicity and put forward that any construction that explicitly signal the process of shared attribution of a truth value to the propositional tokens that compose a discourse should be considered as an epistemic construction. 


We did not want to annotate a predetermined list of epistemic constructions and assign functions to them.

Rather, we provided a theoretical meaningful solid, operationalizable definition of epistemic modality and, on this ground, we identified the linguistic constructions that realize epistemic modality in the three different languages.

In a second step, we also specified the semantic, syntactic and pragmatic function of each epistemic construction identified. 


The annotated corpus is available HERE.


The Modal Project was funded by “La Maison des Sciences de l’Homme du Val de Loire”, the IRCOM Consortium, the “Laboratoire Ligérien de Linguistique – UMR7270”, the Groningen Meaning Bank.