Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC) - the National Corpus of Contemporary Welsh
- Submitting institution
-
Swansea University / Prifysgol Abertawe
- Unit of assessment
- 27 - English Language and Literature
- Output identifier
- 55093
- Type
- S - Research data sets and databases
- DOI
-
10.17035/d.2020.0119878310
- Location
- https://www.corcencc.org/
- Month
- October
- Year
- 2020
- URL
-
https://www.corcencc.org/
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- Yes
- Number of additional authors
-
26
- Research group(s)
-
-
- Proposed double-weighted
- Yes
- Double-weighted statement
- This submission is the main output from a major ASRC/ESRC funded project (ES/M011348/1).
It constitutes a data set of 14,338,149 tokens (circa 11.2-million-words), collected according to a principled sampling frame and submitted to processes of anonymisation, transcription, semantic tagging (using bespoke tool SemCyTag) and Part-of-Speech (POS) tagging (using bespoke tool CyTag). In addition to the corpus (the first of its kind for Welsh Language), the output includes supporting documentation and information, including a project report of approx 19,000 words. All elements of the output are presented bilingually (in English and Welsh).
- Reserve for an output with double weighting
- No
- Additional information
- The CorCenCC submission (National Corpus of Contemporary Welsh – Corpws Cenedlaethol Cymraeg Cyfoes) includes a data set (c. 11.2 million words), bespoke technical tools (Part-of-Speech tagset and tagger (CyTag), adapted semantic tagger (CySemTag), crowdsourcing app, transcription conventions), the Y Tiwtiadur pedagogic toolkit, the Yr Amliadur word frequency lists, and supporting documentation and information, including a project report. The corpus and associated documentation is accessed through the CorCenCC webpages (https://www.corcencc.org/ and https://www.corcencc.cymru/) and technical tools are at https://github.com/CorCenCC. These websites are external to Swansea University. CorCenCC can be explored via the website, and the data behind CorCenCC can be requested via a webpage link.
The three co-founders and co-creators of CorCenCC (Fitzpatrick, Knight and Morris) were jointly responsible for decisions around, and strategic management and co-ordination of, all work-packages and the project team of around 40 individuals. In addition to the operational management of the project, Fitzpatrick’s specific role focused on effective communications, creative trouble-shooting, and dynamic and agile decision-making. Besides her key role in the creation and strategic leadership of the project, Fitzpatrick took a leading role on pedagogical and lexical aspects of the research portfolio. She was also a key author of the central project documentation including the main project report, which documents the research process, provides a detailed overview of CorCenCC tools and outputs, and relates the applications of these to a range of user groups.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -