Spoken British National Corpus 2014
- Submitting institution
-
The University of Lancaster
- Unit of assessment
- 26 - Modern Languages and Linguistics
- Output identifier
- 257281090
- Type
- T - Other
- DOI
-
-
- Location
- -
- Brief description of type
- Department of Linguistics and Modern English Language, Lancaster University. UK
- Open access status
- -
- Month
- November
- Year
- 2018
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
5
- Research group(s)
-
-
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- The Spoken BNC 2014 corpus, developed in collaboration with Cambridge English, is the first substantial, freely available, corpus of casual conversations in UK home settings produced in the UK in over twenty years. The corpus is made up of over 10,000,000 words of orthographically transcribed spontaneous conversation designed around a core (a sociolinguistically balanced subset of conversations ) and mantle (a much larger mantle of data from across the UK collected to a looser sampling frame). The aim with the corpus was to provide a dataset from which users could make generalisations about present day spoken British English from the core, or curate, as the data permitted, more focused samples from the core and mantle to explore some research questions with a larger dataset.
The project used a public participation in science framework to encourage speakers to volunteer potential recordings for the corpus. This new model of spoken corpus production allowed us to construct a spoken corpus at a fraction of the cost of comparable, much older corpora. In addition, it also led us to critically reflect on the process of constructing the broadly comparable BNC 1994 spoken corpus, and to produce a full and realistic account of the difficulty of producing spoken corpora today in what is a very different context, legally and ethically, from that in the early 1990s. In addition, as a result of us producing the corpus, others have started to use it to gain fresh insights into change in spoken British English over the past 20 years.
The full dataset is a FREE DOWNLOAD from http://corpora.lancs.ac.uk/bnc2014/signup.php (simple for any user to create an account to access) and is available to query, free of charge, through Lancaster’s CQPweb system and Sketch Engine. The core sample is also available to query through the BNCLab interface http://corpora.lancs.ac.uk/bnclab/search
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -