KungFu: making training in distributed machine learning adaptive
- Submitting institution
-
Imperial College of Science, Technology and Medicine
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 5105
- Type
- E - Conference contribution
- DOI
-
-
- Title of conference / published proceedings
- Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation
- First page
- 937
- Volume
- -
- Issue
- -
- ISSN
- -
- Open access status
- -
- Month of publication
- November
- Year of publication
- 2020
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
5
- Research group(s)
-
-
- Citation count
- -
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- KungFu provides an open-source implementation (https://github.com/lsds/KungFu) for principled adaptation during the training process in machine learning (ML) frameworks (>180 Github stars). KungFu is being integrated with the industrial MindSpore AI platform (https://github.com/mindspore-ai), and led to the creation of a MindSpore Special Interest Group on Adaptive Distributed Training (see MindSpore TSC mailing list; https://gitee.com/mindspore/community/tree/master/sigs/adaptivetraining). The work resulted in follow-on research funding by Huawei (>£1M) and Alibaba ($100K), and is being commercially exploited by Huawei (Huawei Edinburgh AI Centre, contact: FoEREF@ic.ac.uk). OSDI’20 acceptance rate: 17%/398; Artefact Evaluated and Functional.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -