CochleaNet: A robust language-independent audio-visual model for real-time speech enhancement
- Submitting institution
-
University of Wolverhampton
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 1664
- Type
- D - Journal article
- DOI
-
10.1016/j.inffus.2020.04.001
- Title of journal
- Information Fusion
- Article number
- -
- First page
- 273
- Volume
- 63
- Issue
- -
- ISSN
- 1566-2535
- Open access status
- Compliant
- Month of publication
- April
- Year of publication
- 2020
- URL
-
-
- Supplementary information
-
https://ars.els-cdn.com/content/image/1-s2.0-S1566253520302475-mmc1.xml
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- Yes
- Number of additional authors
-
3
- Research group(s)
-
-
- Citation count
- 1
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- The pioneering speech enhancement (SE) research has demonstrated the revolutionary potential of context-aware, lip-reading neural networks for seamless MM HA operation. The used dataset ASPIRE is the first of its kind AV speech corpus, recorded in real noisy environments, and is available online to the wider community for evaluating novel SE frameworks in real-world settings. The research demonstrates superior performance over existing SE approaches and challenges a popular belief that scarcity of a multi-lingual, large-vocabulary corpus is a major bottleneck to build robust language, speaker and noise-independent SE systems. It formed the basis for the awarded EPSRC programme grant COG-MHEAR.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -