Submitted outputs' details

The submitted outputs' details allows you to browse and search for outputs submitted to the REF 2021. Use the search and filters below to find the outputs you are looking for.

Back

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Submitting institution: University of Edinburgh
Unit of assessment: 11 - Computer Science and Informatics
Output identifier: 178562818
Type: E - Conference contribution
DOI: 10.18653/v1/P19-1580
Title of conference / published proceedings: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (long papers)
First page: 5797
Volume: -
Issue: -
ISSN: -
Open access status: -
Month of publication: July
Year of publication: 2019
URL: -
Supplementary information: -
Request cross-referral to: -
Output has been delayed by COVID-19: No
COVID-19 affected output statement: -
Forensic science: No
Criminology: No
Interdisciplinary: No
Number of additional authors: 4
Research group(s): D - Language, Interaction and Robotics
Citation count: 18
Proposed double-weighted: No
Reserve for an output with double weighting: No
Additional information: The paper is the first to demonstrate redundancies and emerging specialization in an extremely popular and effective class of models, Transformers. Transformers are huge neural networks consisting of a large number of sub-components, called 'heads'. We study machine translation and show that only a small subset of heads is important and these heads are mostly linguistically interpretable. These findings motivated numerous follow-up work (e.g., from the University of Washington, MIT, and Google), including (1) methods hand-crafting 'heads', producing cheap and small models; (2) techniques for 'pruning' or adaptively changing model size; (3) follow-on studies with other tasks and models.
Author contribution statement: -
Non-English: No
English abstract: -