Impact case study database
Natural Language Processing for Text Prediction and Feedback (TPAF): machine learning research saves 100,000 years in mobile phone typing time and provides feedback to language learners in less than 15 seconds
1. Summary of the impact
Research in machine learning (ML) and natural language processing carried out by Cambridge’s Natural Language Processing (NLP) Group, led by Professor Ted Briscoe, has:
provided the foundational ML techniques and data that were used in the development of the word prediction app SwiftKey, which was acquired by Microsoft in 2016 for USD250 million and has now been incorporated into their flagship mobile app, “SwiftKey Keyboard” for Android and iPhone, saving its 300 million users more than 100,000 years in combined typing time.
provided the prototype technology for the “Write & Improve” tool which has been used by over 500,000 learners of English worldwide to obtain feedback on their texts in less than 15 seconds, helping to make language learning more efficient for learners and teachers.
2. Underpinning research
Research from Cambridge’s Natural Language Processing Group (NLP) has led to the development of Machine Learning (ML) based language models and classifiers for text entry prediction, and for automated assessment and feedback on writing style.
From 2011 onwards, Briscoe and his team developed unsupervised language modelling integrated with supervised (sequential) classification combining general background information about language with focused task specific foreground data. Applications explored included e-mail spam filtering, and document topic classification. Later work produced state-of-the-art results for grammatical error detection in non-native written and spoken language using neural sequence models with auxiliary language modelling objectives [R1,R2,R3].
Text entry prediction
For applications such as email filtering, the team explored means of scaling down the technology to a point where it could be used on smaller devices such as smartphones.
They also developed techniques for word prediction (as opposed to character/word completion prediction) based initially on Bloom Filters [R4] but later on neural language model compression [R1].
Briscoe’s STFC-PIPSS technology transfer grant ‘Scalable and Robust Grid-based Text Mining of Scientific Papers’ between Cambridge University and the group’s technology and knowledge company iLexIR, used the European Grid -- the distributed computational network assembled to process the large volumes of data generated by the Large Hadron Collider -- to gather around a trillion words (25 terabytes) of 5-gram sequences of fluent text from the WWW and bucket them into distinct languages with frequency counts. These data, in combination with the techniques described above, was used by Briscoe and his ex-student Dr Ben Medlock (also then working for iLexIR) to train the prototype of the text entry prediction engine behind SwiftKey.
Analysis of writing style
The team also explored the task of automated feedback on writing style and how it can be applied to the process of learning a foreign language [R2].
In 2013, Briscoe co-founded the Institute for Automated Language Teaching and Assessment ( ALTA) - administered by the Department of Computer Science and Technology - with the support of Cambridge Assessment (CA). Cambridge Assessment owns and operates the University’s three exam boards, employing over 2,500 people to develop and deliver a wide range of national and international examinations to educational establishments around the world, including 170 countries and 8 million learners.
ALTA researches and develops tools to support learning of both written and spoken English and provides proof of concept and prototype software. It is run by an interdisciplinary research team drawing on expertise from across the University of Cambridge; Professor Briscoe served as the first Director of the Institute and was succeeded by Professor Paula Buttery in 2018. To date, research by ALTA has led to over 120 internationally-reviewed publications which have informed the development of the 'X & Improve' products.The NLP ALTA team have developed automatic marking and feedback systems for writing, based on supervised ML techniques, to predict the language proficiency scores and suggested improvements that would have been produced by human assessment.
The ML systems are trained with data from CA, including the Cambridge Learner Corpus. This is the world’s largest collection of exam papers taken by English language learners around the world: almost 65 million words gathered over a 20-year period from tests taken by real exam candidates living in 217 different countries or territories with 148 different native languages. Each test has been transcribed and information gathered about the learner’s age, language and grade achieved. Crucially, all errors (grammar, spelling, misuse, word sequences, and so on) have been annotated so that a computer can process the natural language used by the learner.
Several of the ML systems have been incorporated into the tool Write & Improve [R5]. This includes three components developed by the team:
1) a scoring model that indicates the Common European Framework of Reference ( CEFR) level of the submitted text [R3,R4]; 2) a problematic text region classifier that highlights areas of text that a learner should focus on improving [R2]; and 3) a correction suggestion model that provides the learner with guidance and feedback [R1,R5].
3. References to the research
All research outputs marked with * have been through a rigorous peer-review process.
* [R1] Helen Yannakoudakis, Marek Rei, Øistein E. Andersen and Zheng Yuan.“Neural Sequence-Labelling Models for Grammatical Error Correction”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017. https://www.aclweb.org/anthology/D17-1297.pdf * [R2] Mariano Felice, Zheng Yuan, Øistein E. Andersen, Helen Yannakoudakis and Ekaterina Kochmar. “Grammatical error correction using hybrid systems and type filtering”. CoNLL Shared Task Proceedings, 2014.
https://pdfs.semanticscholar.org/7041/280a628db8217ccdbb130b7211cda359e987.pdf *[R3]. Yannakoudakis, H., Briscoe, E.J. and Medlock, B. “A New Dataset and Method for Automatically Grading ESOL Texts,” 49th Annual Mtg of Assoc. for Comp. Linguistics, Portland, OR, 2011. https://www.aclweb.org/anthology/P11-1019.pdf [R4]. Briscoe, E.J., Medlock, B. and Andersen, O., “Automated Assessment of ESOL Free Text Examinations,” University of Cambridge, Computer Laboratory, TR-790,2010: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-790.pdf * [R5] Helen Yannakoudakis, Øistein E. Andersen, Ardeshir Geranpayeh, Ted Briscoe and Diane Nicholls. Developing an Automated Writing Placement System for ESL Learners. Journal of Applied Measurement in Education, 2018.
https://www.cl.cam.ac.uk/~hy260/WI-cefr.pdf
Grants
Parker, A., Briscoe, E, J., and Hobson, M. STFC - PIPSS technology transfer grant: ‘Scalable and Robust Grid-based Text Mining of Scientific Papers’, 2008-09.
4. Details of the impact
Impact 1: The team’s research was critical to the development of the keyboard app SwiftKey, which was acquired by Microsoft in 2016 for USD 250 million:
Research by Briscoe’s NLP group on language modelling and classification and on compression of such models [R4] provided the foundational ML techniques that were used in the development of SwiftKey. The SwiftKey App was designed to pick up the patterns in people’s language use, and through Machine Learning (ML) techniques, to be very accurate in predicting what people were most likely to write next. Language data was gathered by Briscoe and his colleagues as part of an STFC grant [E1] and iLexIR, a technology and knowledge transfer company cofounded by Briscoe, provided the data to the 2009 start-up (co-founded by ex-Cambridge students Ben Medlock and John Reynolds) that became SwiftKey. This data was used to train the language prediction models on SwiftKey’s new keyboard application and – as the co-founder and CTO of Switfkey confirms - “made a huge difference at a critical early stage” [E1].
SwiftKey was first released as an exclusive for Android Market in July 2010, followed by an iOS release in September 2014 after Apple allowed third-party keyboard support. By June 2014, users were estimated to have saved half a trillion keystrokes, and SwiftKey had been the top paid Android app for two years running [E2]. By 2016 SwiftKey software was installed on more than 300 million devices and it was estimated that its users had saved nearly 10 trillion keystrokes, across 100 languages, saving more than 100,000 years in combined typing time [E3].
The success of SwiftKey led to its acquisition by Microsoft in 2016 for USD250 million [E4]. The software was incorporated into Microsoft’s flagship mobile app, SwiftKey Keyboard for iPhone and Android (500 million installs by 2020), which are still popular today [E4].
Impact 2: The team’s research on automatic feedback and assessment of texts for “English for Speakers of Other Languages” has helped over 500,000 learners and reduced the workload of teachers.
The NLP group’s knowhow and the ALTA prototypes for automated writing assessment were the basis of the 2014 University spin-out “English Language iTutoring” ( ELiT).
The first production version of the assessment software was released by ELiT as a free cloud-based service “Write & Improve”, in September 2016. Within a year of its launch, it was used by over 650,000 people in 225 countries with well over a million pieces of writing submitted and checked [E5].
The ML technology enables Write & Improve to assess the writing and estimate a proficiency level, to identify errors and make suggestions, and to give supportive feedback - all in under 15 seconds [E6]. The learner is able to make corrections and resubmit their writing repeatedly, and the suggested improvements will change with each submission because the feedback is scaffolded.
Testimonials from students provide evidence of the success of Write & Improve in creating a free, easy-to-use, low-stakes, friendly, online environment that encourages students to practise, reflect on, revise and improve their writing independently. “The best thing is that we have the opportunity to edit our work whenever we want. It's good because we have time to think about our mistakes and then correct them.” “It's clear, easy to understand, but makes you think as well. I was pleasantly surprised when the system gave me the derivation of the word I used incorrectly.” [E7]
Teachers also recognise the benefit for themselves and their students. A review article on the ELT [English Language Teaching] news website explains: "I definitely recommend “Write&Improve” to all teachers and students. It is not only free and easy to use but also it helps students redraft and correct in a pleasant and stress free environment, something that most of them have never done before and this is what really makes the difference." [E8]
Write & Improve also helps to reduce a teacher’s workload. A Senior Research Manager at Cambridge Assessment described the tool as a way “to keep the learner motivated and engaged, and to help them identify and eliminate common and repeated errors, so that their teacher can focus on those issues which require ‘human’ support, such as discourse organisation, argumentation or nuance” [E6]. It was reported from two Turkish Universities who have been using the tool that “Teachers found it useful for drafting, focusing on language outside of class and practicing language. It allowed them to focus on different tasks and provide more salient feedback.” Students were using the tool 30+ times to redraft homework. This allowed them to correct many issues before handing in the assignment, so that teacher feedback was reduced, and the “demotivating” effect of many minor corrections was reduced. [E9].
In December 2019, ELiT was fully acquired by Cambridge Assessment and Cambridge University Press [E10] [text removed for publication] [E11]. The press release described ELiT as ‘world-class experts in using AI to support English learning…they have already worked with Cambridge English on ground-breaking projects and resources for learners’ [E10]. ALTA continues to support the R&D base of the company [E12].
5. Sources to corroborate the impact
[E1]. Letter from co-founder and former CTO of SwiftKey, 9 May 2019
[E2]. “How Top-Selling Genius Keyboard SwiftKey Got So Smart:”
[E3]. “Microsoft acquires SwiftKey in support of re-inventing productivity ambition:” https://blogs.microsoft.com/blog/2016/02/03/microsoft-acquires-swiftkey-in-support-of-re-inventing-productivity-ambition/ [E4]. Switfkey gets rebranded as Microsoft Swiftkey Keyboard
https://www.androidheadlines.com/2020/05/microsoft-swiftkey-keyboard-rebrand [E5]. Cambridge Assessment Annual Review 16-17. See page 25. https://www.cambridgeassessment.org.uk/Images/463822-annual-review-16-17.pdf
[E6]. Combining the Power of AI with the Experience of Examiners https://www.cambridgeassessment.org.uk/insights/keeping-artificial-intelligence-human-combining-the-power-of-ai-with-the-experience-of-examiners/[E7]. What learners have said https://www.cambridgeenglish.org/learning-english/free-resources/write-and-improve/
[E8]. ELT News Review of Write and Improve
https://www.eltnews.gr/news/1493-review-cambridge-english-write-improve
[E9]. Letter from co-founder and CTO of ELiT
[E10]. Acquisition of ELiT. See page 7. https://www.cambridgeassessment.org.uk/Images/579483-achieve-april-2020.pdf
[E11]. [Text removed for publication]
[E12]. ALTA continues to support the R&D base https://englishlanguageitutoring.com/our-rd/
Additional contextual information
Grant funding
Grant number | Value of grant |
---|---|
ST/G003599/1 | £83,897 |