Intonational Variation in Arabic Corpus
- Submitting institution
University of York
- Unit of assessment
- 26 - Modern Languages and Linguistics
- Output identifier
- S - Research data sets and databases
- Colchester, Essex
- Supplementary information
- Request cross-referral to
- Output has been delayed by COVID-19
- COVID-19 affected output statement
- Forensic science
- Number of additional authors
- Research group(s)
- Proposed double-weighted
- Double-weighted statement
- This is a large corpus based on a multi-year project, opening up research possibilities across several Arabic varieties. It represents a dozen speakers of Arabic from each of 8 countries, with Morocco represented thrice to capture age and bilingual variation, with speech elicited in five different tasks. The scope of the database and the amount of work put in to making it fully functional and accessible are comparable with those of a large book project.
- Reserve for an output with double weighting
- Additional information
- This is a large corpus of parallel speech data from 12 speakers each (6 female/6 male) in 10 datasets across 8 regionally defined varieties of Arabic elicited for the purposes of intonational analysis. (Morocco is represented by additional datasets to capture age and bilingual variation.) We used established elicitation techniques, modelled on earlier ESRC funded work on English, but adapted to handle lexical variation between dialects and the fact that dialectal Arabic is typically unwritten, plus the technical innovation of additional control of stressed syllable position in scripted data stimuli.
The IVAR elicitation instruments yield a multi-level corpus ranging from fully scripted read speech (scripted dialogue and read narrative) to (semi-)spontaneous unscripted speech (narrative retold from memory, map tasks and free conversation). We worked with a local collaborator in each recording location to recruit an opportunity sample of participants, controlling for age, gender and first language dialect of Arabic. All participants were aged 18 or over and provided informed consent for use and distribution of their speech data. Recordings were made on location in the Middle East and North Africa in the town or city of residence of speakers, apart from speakers originally from Damascus and Baghdad who were recorded in Amman, Jordan, due to the security situation. A paid local fieldworker and first language speaker of the relevant dialect ran the recording sessions, using high quality recording equipment and headworn microphones (.wav 44.1kHz 16 bit). Pairwork tasks were recorded in stereo on separate tracks to facilitate separate analysis of each individual’s speech.
The full dataset has been downloaded by 23 unique registered users and portions by 52 unique registered users, with 1278 downloads of documentation which is freely downloadable without registration and includes original data elicitation instruments (illustrated narrative, Arabic map task) devised for the study [ReShare metrics 31.12.2020].
- Author contribution statement
- English abstract