Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance
- Submitting institution
-
University College London
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 14262
- Type
- E - Conference contribution
- DOI
-
10.1145/3038912.3052622
- Title of conference / published proceedings
- Proceedings of the 26th International Conference on World Wide Web
- First page
- 695
- Volume
- -
- Issue
- -
- ISSN
- 0000-0000
- Open access status
- Out of scope for open access requirements
- Month of publication
- April
- Year of publication
- 2017
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- Yes
- Number of additional authors
-
2
- Research group(s)
-
-
- Citation count
- 19
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- Health surveillance systems based on user-generated content often rely on the automatic identification of textual markers. These systems are criticized for using correlated features that have no semantic relationship to the target disease (correlation is not causation). To address this, we used a neural word embedding space trained on social media to determine how strongly textual features are semantically linked to an underlying health concept. The proposed feature selection method improves disease surveillance models by up to 28%. Published at an A* conference, the method is a key component to the online influenza surveillance system adopted by PHE in 2018.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -