Using Visual Speech Information in Masking Methods for Audio Speaker Separation
- Submitting institution
-
The University of East Anglia
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 182621694
- Type
- D - Journal article
- DOI
-
10.1109/TASLP.2018.2835719
- Title of journal
- IEEE Transactions on Audio, Speech, and Language Processing
- Article number
- -
- First page
- 1742
- Volume
- 26
- Issue
- 10
- ISSN
- 1558-7916
- Open access status
- Compliant
- Month of publication
- October
- Year of publication
- 2018
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
2
- Research group(s)
-
-
- Citation count
- 2
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- Speaker separation extracts a target speaker from a mixture of interfering speakers. The most successful methods use masking and exploit only the audio part of the speech. The significant part of this paper is that proposed framework exploits visual speech information in addition to the audio. Speaker separation has application to security services who wish to ‘eavesdrop’ on conversations where unlawful activities may be being discussed. We have worked with a government agency (name confidential) on how to include the complementary information provided by audio-visual speaker separation within a broader system to extract target speech within counter-terrorism applications.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -