From word embeddings to document distances
- Submitting institution
-
University College London
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 14697
- Type
- E - Conference contribution
- DOI
-
-
- Title of conference / published proceedings
- 32nd International Conference on Machine Learning, ICML 2015
- First page
- 957
- Volume
- 2
- Issue
- -
- ISSN
- 2640-3498
- Open access status
- Out of scope for open access requirements
- Month of publication
- July
- Year of publication
- 2015
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
3
- Research group(s)
-
-
- Citation count
- -
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- Proposes two new distance measures between documents. The first is the optimal transport between the documents’ word embeddings. The second is a lower bound, for which we give a quadratic-time algorithm. Further we devise a prefetching method to drastically reduce the computation needed to find the nearest neighbors of a query document. This work has been used by the online reservation system OpenTable to compare restaurant reviews: http://tech.opentable.com/2015/08/11/navigating-themes-in-restaurant-reviews-with-word-movers-distance/ It is also implemented in the popular NLP library gensim: https://radimrehurek.com/gensim/auto_examples/tutorials/run_wmd.html
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -