Corpus Information
To date, the corpus includes 11 oral texts, delivered between March 3rd and May 16th 2020 by the Italian Prime Minister (Mr. Giuseppe Conte), and the President of the Italian Republic (Mr. Sergio Mattarella).
It provides a total amount of 1 hour, 59 minutes and 47 seconds of transcribed public speech.
All the transcripts have been aligned to the speech signal through ELAN (*.eaf).
Moreover, oral texts have been lemmatized, part-of-speech tagged and parsed with the LinguA pipeline, using the standard Penn TreeBank tagset.
All metadata descriptions have been stored in XML format, following the IMDI (ISLE Meta Data Initiative) definitions.

Download the corpus


Data Collection: Gloria Gagliardi, Alice Suozzi
Corpus Annotation: Gloria Gagliardi