Submitted outputs' details

The submitted outputs' details allows you to browse and search for outputs submitted to the REF 2021. Use the search and filters below to find the outputs you are looking for.

Back

Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC) corpus and query tools

Submitting institution: Swansea University / Prifysgol Abertawe
Unit of assessment: 26 - Modern Languages and Linguistics
Output identifier: 55093
Type: S - Research data sets and databases
DOI: 10.17035/d.2020.0119878310
Location: https://www.corcencc.org/
Month: October
Year: 2020
URL: https://www.corcencc.org/
Supplementary information: -
Request cross-referral to: -
Output has been delayed by COVID-19: No
COVID-19 affected output statement: -
Forensic science: No
Criminology: No
Interdisciplinary: Yes
Number of additional authors: 26
Research group(s): -
Proposed double-weighted: Yes
Double-weighted statement: This submission is the main output from a major AHRC/ESRC four year funded project (ES/M011348/1). It constitutes a data set of 14,338,149 tokens (circa 11.2-million-words), collected according to a principled sampling frame and submitted to processes of anonymisation, transcription, semantic tagging (using bespoke tool SemCyTag) and Part-of-Speech (POS) tagging (using bespoke tool CyTag). In addition to the corpus (the first of its kind for Welsh Language), the output includes supporting documentation and information, including a project report of approx 19,000 words. All elements of the output are presented bilingually (in English and Welsh).
Reserve for an output with double weighting: No
Additional information: -
Author contribution statement: -
Non-English: No
English abstract: -