Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code
- Submitting institution
-
University of Edinburgh
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 164737049
- Type
- E - Conference contribution
- DOI
-
10.1145/3377811.3380342
- Title of conference / published proceedings
- ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
- First page
- 1073
- Volume
- -
- Issue
- -
- ISSN
- -
- Open access status
- -
- Month of publication
- June
- Year of publication
- 2020
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
4
- Research group(s)
-
B - Data Science and Artificial Intelligence
- Citation count
- -
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- Statistical language modeling techniques have been applied to large source code corpora, yielding a variety of new software development tools for code suggestion, improving readability, and API migration. This paper was the first to show that open vocabulary models, which do not limit the set of identifiers that can be generated, provide benefits for machine learning on software. This paper won an ACM Distinguished Paper Award (top 10% of accepted papers). Simultaneously, several industrial code autocompletion systems, such as IntelliCode Compose (Microsoft Research) and Deep TabNine (TabNine start-up), developed their own open vocabulary models using similar principles.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -