A segmental framework for fully-unsupervised large-vocabulary speech recognition
- Submitting institution
-
University of Edinburgh
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 58755330
- Type
- D - Journal article
- DOI
-
10.1016/j.csl.2017.04.008
- Title of journal
- Computer Speech and Language
- Article number
- -
- First page
- 154
- Volume
- 46
- Issue
- -
- ISSN
- 0885-2308
- Open access status
- Compliant
- Month of publication
- May
- Year of publication
- 2017
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
2
- Research group(s)
-
D - Language, Interaction and Robotics
- Citation count
- 21
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- Presents an unsupervised system for full-coverage segmentation and clustering of speech into word-like units: the first such system to be tested on large-vocabulary multi-speaker data, and the first to test CAE features (developed in Output 2) in a downstream task. This paper shows that unsupervised Bayesian models can scale to complex speech datasets, and that the top-down guidance provided by the Bayesian model structure improves results compared to a purely bottom-up approach. It outperforms state-of-the-art baseline (which uses only bottom-up information) on single-speaker data, and a bottom-up version of our method on multi-speaker data.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -