OcOr: A Corpus of Occitan Oral Narratives
- Submitting institution
-
Queen's University of Belfast
- Unit of assessment
- 26 - Modern Languages and Linguistics
- Output identifier
- 165783513
- Type
- S - Research data sets and databases
- DOI
-
10.5281/zenodo.1451753
- Location
- Europe
- Month
- October
- Year
- 2018
- URL
-
https://doi.org/10.5281/zenodo.1451753
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
1
- Research group(s)
-
-
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- OcOr was created by and for the ExpressioNarration project, funded by Horizon 2020 (Carruthers was PI; Vergez-Couret was Marie Curie Fellow) and is the first digitised annotated corpus of Occitan oral narrative. It contains 58 stories from the Languedocien and Gascon dialects across three sub-corpora, designed to vary in terms of storytellers’ sources, storytelling practice and mode of transmission, thus facilitating a new approach to analysis of the relationship between temporality and orality. OWT is a written published corpus with traditional oral sources, OOT is a traditional oral corpus with oral sources, and OOC is a contemporary oral corpus involving both oral and written sources. The header for each narrative contains speaker metadata (e.g. age, gender, education, dialect, location), story metadata (Aarne-Thompson classification) and the annotation taxonomy for a range of linguistic features (discourse relations, tense, frames and connectives). Prior to annotation, the research process for all sub-corpora involved adapting and applying POS-tagging tools to a multi-dialectal corpus. OWT also required manual standardisation of dialectal spelling (to facilitate POS-tagging), OOT the transcription of archival field recordings, and OOC the organisation of two contemporary storytelling events in Toulouse and subsequent data transcription. The overall share of the work was 50% each, with the authors having different roles in the research process. Carruthers led the research and was responsible for the core design of the corpus. Building on previous research by Carruthers on French (i.e. construction of the French Oral Narrative Corpus, 2013), the authors co-designed the annotation taxonomy using the Text Encoding Initiative (TEIP5). The methodology is discussed in Carruthers and Vergez-Couret (Corpus, 2018).
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -