Fuxi: A fault-tolerant resource management and job scheduling system at internet scale
- Submitting institution
-
The University of Leeds
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- UOA11-1585
- Type
- E - Conference contribution
- DOI
-
10.14778/2733004.2733012
- Title of conference / published proceedings
- Proceedings of the VLDB Endowment
- First page
- 1393
- Volume
- 7
- Issue
- 13
- ISSN
- 2150-8097
- Open access status
- Out of scope for open access requirements
- Month of publication
- August
- Year of publication
- 2014
- URL
-
http://www.vldb.org/pvldb/vol7/p1393-zhang.pdf
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
5
- Research group(s)
-
E - DSS (Distributed Systems and Services)
- Citation count
- 43
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- This paper reports our work with Alibaba on Fuxi – an advanced job scheduling system for massive-scale clusters, with innovative algorithms for incremental and efficient resource management. While Hadoop’s Yarn (2013) could only reach a scale of 4,000 servers for acceptable performance, Fuxi achieves a scale up to 8,000 servers and holds the world record (2015) of sorting 100TB data (within 400 seconds, http://sortbenchmark.org). This work is positioned amongst the world-leading systems (Google’s Borg, Microsoft’s Autopilot, Facebook’s Tupperware) by Google’s white paper (John Wikes), Google keynote (Jeff Dean), IBM report (Berthold Reinwald), leading to a £1M+ grant (2020-23, EP/T01461X/1).
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -