Impact case study database
Search and filter
Filter by
- The University of Reading
- 11 - Computer Science and Informatics
- Submitting institution
- The University of Reading
- Unit of assessment
- 11 - Computer Science and Informatics
- Summary impact type
- Technological
- Is this case study continued from a case study submitted in 2014?
- No
1. Summary of the impact
Border security is a most complex combination of security, political, economical, and technical problems, and when resolved, can reduce the barriers to faster and safer travel. Research at Reading has directly responded to the technical challenge of harmonized, secure and rapid crossing of borders by travellers through the development, deployment and evaluation of ground-breaking processes and technologies for traveller identification and monitoring in two EU-funded projects: FastPass, and its follow-on, PROTECT. The resulting transformative traveller identification system is secure, fast, efficient, user-centric and deployable at all border crossings. A combination of proof of concept research and engagement with the whole industry and related policy sphere has shifted positions on border security. The Reading system has contributed to the implementation of the 2025 UK Border Strategy. Research at Reading has also been positively validated by border authorities in four EU Member States: Austria, Greece, Poland and Romania. The University of Reading’s main industry partner has benefitted through creation of a new business unit, new products and competitively won follow-on research. Reading’s work has directly informed policy makers and legislators; and new international standards have subsequently been developed in biometrics and next-generation digital travel credentials.
2. Underpinning research
Automated Border Control (ABC), specifically ABC eGates at airports, has, since the mid-2000s, enabled identity verification to be performed using passports more rapidly and accurately than traditional manual checking. However, the application of current day eGates still does not meet expectations with respect to performance (speed and accuracy) as well as availability, impacting on public acceptance. Specifically, commercial systems are limited by (1) the time taken to read the ePassport chip (typically between 15 and 20s), (2) the use of a single International Civil Aviation Organisation (ICAO) mandated biometric modality (typically the face) – which limits accuracy, and (3) lack of advanced means of detecting spoofing attempts and anomalous (including evasive) behaviour. Finally, the eGate market is fragmented: no supplier provides a single harmonized architecture which can be deployed across land, sea and air border types. In 2013, Professor Ferryman and his team at Reading embarked on research on traveller identification and monitoring to help overcome these challenges through two major research efforts: EC FastPass (between 2013 and 2017) and EC PROTECT (between 2016 and 2019), selected by the EC for funding for their high innovation and impact potential.
EC FastPass: The EC FastPass project was funded by the EU (27 partners, EUR15.6m, EUR883,000 to Reading, [G1]) and coordinated by the Austrian Institute of Technology (AIT). In consultation with a wide range of stakeholders, the Reading researchers set out to develop a next-generation modular eGate that is flexible in that it can be adapted and deployed at air and sea border crossing points (with travellers ‘on foot’) as well as at land border crossing points (with travellers in vehicles). The key innovation proposed was a segregated two-step system to increase throughput: (1) pre-enrolment (with the traveller’s face as token) at a kiosk; (2) face verification within the eGate (no travel document presentation required, reducing the ‘transaction’ time). The University of Reading was a core project partner leading the largest technological workstream on traveller identification and monitoring.
Reading developed core biometric and video surveillance technology modules, with key research contributions in four major areas: (1) multibiometric optimised fusion, including a novel multimodal face and fingerprint fusion algorithm which is robust in the presence of spoofing attacks [R1]; (2) improved cross-spectral iris recognition performance when using selective feature-level fusion and without increasing the length of the iris code [R2]; (3) abnormality detection based on a novel soft computing algorithm for unsupervised abnormal behaviour-detection [R3] applied for the first time to monitoring of travellers in queues based on stereo cameras; (4) development of novel rule-based software for ABC detection and alarming which fuses information from a wide range of sources (including biometric, video analytics, travel document reading, background checks) to compute a risk score for each traveller.
Notwithstanding FastPass’s achievements, Ferryman identified, as a direct result of the research on traveller identification, key limitations. The eGate transaction time, albeit reduced, remained significant due to the physical gate mechanics and the process not being non-stop. Furthermore, only one or two biometric modalities are employed, limiting identification accuracy and increasing susceptibility to spoofing. Hence, Ferryman conceived a radical vision to completely eliminate eGates and create a harmonized no-gate, free-flowing border control system employing digital travel credentials and multimodal contactless biometric identification.
EC PROTECT: The resulting Reading-led EC PROTECT project (between 2016 and 2019) - 10 partners, EUR5m, EUR1.1m to Reading, [G2] - responded to this challenge and researched, devised, demonstrated and evaluated a secure, minimally-intrusive and usable free-flowing biometric-based identity confirmation system deployable at land, sea and air borders. Ferryman engaged with more than 50 stakeholders (including border authorities, travellers, legal experts, policy makers, technology and security experts) to establish the real user needs and consequently enhance acceptability of the applied research on free-flowing, biometric-based identity verification at the border. The research resulted in a set of novel concepts and processes, technical specifications, system design and implementation of an innovative two-step process: (1) A multimodal biometric enrolment kiosk; and (2) Contactless multimodal verification on-the-move, integrating travellers’ smartphones (as vectors of biographic and biometric data) and advanced passports (including secure ultra-high frequency (UHF) technology).
Reading’s research focussed on the development of a novel biometric corridor solution for on-the-move traveller identification integrating multimodal biometric fusion, counter spoofing and video analytics (person re-identification and tracking to support biometric matching and detect anomalies, e.g. evasion of the control) [R4]. The overall PROTECT multibiometric verification system underwent significant technical and user validation in accordance with international standards, including how to overcome practical issues that may be encountered in real-world deployments [R4]. Additionally, Reading’s research addressed iris biometrics, specifically a novel convolution neural network (CNN)-based semantic segmentation of irises captured by mobile sensors which significantly improves iris recognition performance [R5], and a new framework for quality-based iris segmentation, with particular focus on unconstrained settings representative of traveller on-the-move scenarios [R6]. PROTECT has resulted in a ground-breaking traveller identification system which improves the security and efficiency of the border identification process, is applicable to land, sea and air borders, and incorporates strong user-centric features.
3. References to the research
The research resulted from competitive, peer-reviewed funding applications; it was published in peer-reviewed journals and conferences; internal peer review against REF criteria judged the overall outputs’ profile to meet/exceed 2* quality through contribution of new and useful applied concepts, knowledge, methods and performance assessment in biometrics and surveillance.
Wild, P., Radu, P., Chen, L. and Ferryman, J. (2016). ‘ Robust multimodal face and fingerprint fusion in the presence of spoofing attacks’. Pattern Recognition, 50, 17-25. ISSN 0031-3203 DOI: https://doi.org/10.1016/j.patcog.2015.08.007
Wild, P., Radu, P. and Ferryman, J. (2015) ‘ On fusion for multispectral iris recognition’. 8th IAPR International Conference on Biometrics (ICB2015), 19-22 May, 2015, Phuket, Thailand, 31-73. ISSN 2376-4201 DOI: https://doi.org/10.1109/ICB.2015.7139072
Patino, L. and Ferryman, J. (2014) ‘ Multiresolution semantic activity characterisation and abnormality discovery in videos’. Applied Soft Computing, 25, 485-495. ISSN 15684946 DOI: https://doi.org/10.1016/j.asoc.2014.08.039
Galdi, C., Boyle, J., Chen, L., Chiesa, V., Debiasi, L., Dugelay, J.-L., Ferryman, J., Grudzien, A., Kauba, C., Kirchgasser, S., Kowalski, M., Linortner, M., Maik, P., Michon, K., Patino, L., Prommegger, B., Sequeira, A., Szklarski, Ł. and Uhl, A. (2020). ‘PROTECT: Pervasive and useR fOcused biomeTrics bordEr projeCT. A Case Study’. IET Biometrics. 9(6), 297-308. ISSN 2047-4938 DOI: https://doi.org/10.1049/iet-bmt.2020.0033
Hofbauer, H., Jalilian, E., Sequeira, A., Ferryman, J. and Uhl, A. (2019). ‘ Mobile NIR iris recognition: identifying problems and solutions’ (2018). Proceedings of the IEEE 9th International Conference on Biometrics: Theory, Applications, and Systems (BTAS2018), 22nd October 2018, Los Angeles, USA, 1-9. ISSN 2474-9680 DOI: https://doi.org/10.1109/BTAS.2018.8698590
Wild, P., Hofbauer, H., Ferryman, J. and Uhl, A. (2016). ‘ Quality-based iris segmentation-level fusion’. EURASIP Journal on Information Security, 25. ISSN 1687-417X DOI: https://doi.org/10.1186/s13635-016-0048-x
Research Grants:
[G1] European Commission: FastPass (A harmonized, modular reference system for all European automated border crossing points), between 1 January 2013 and 31 March 2017, overall budget EUR15,592,395.28. Grant agreement no. 312583.
[G2] European Commission: PROTECT (Pervasive and UseR Focused BiomeTrics BordEr ProjeCT), between 1 September 2016 and 31 August 2019, overall budget EUR4,981,752.50. Grant agreement no. 700259.
4. Details of the impact
Impact on border guards and authorities: FastPass developed a next-generation harmonized ABC eGate solution, integrating novel technology modules for traveller identification and monitoring [S6]. The reference implementation, for the first time, facilitated adaptability and deployment to all types of border (land, sea and air) based on an open-system architecture. The solution represents the first European solution for cars at land borders with ABC, as well as the first solution for cruise ships [S1a]. The overall FastPass solution was thoroughly evaluated at three different border control points: the Port of Piraeus in Greece, the Airport of Vienna in Austria, and the land border crossing point of Moravita in Romania, over several months. More than 10,000 travellers and approximately 200 border guards used the system, which provided a novel analysis and insight into different scenarios and their results from technical, operative, social and legal perspectives [S1a]. In 2020, the FastPass eGate solution was positively evaluated for clearance of people in vehicles at Dunkirk ferry port [S1b]. Finally, the Reading researchers contributed to an in-depth evaluation of the FastPass solution, on biometrics and monitoring, which resulted in a best practices report – ‘Recommendations for future ABC installations’ for setting-up, operating and assessing ABC systems’ [S2] published via FRONTEX – the European Border and Coast Guard Agency – to EU border guards.
PROTECT’s system is world-leading. It is the first example anywhere of a multimodal biometrics on-the-move system for border control incorporating mobile and advanced passports, applicable to all border crossing types. As well as a series of technical reports, the PROTECT system was demonstrated in two real-world locations in 2019: at London St. Pancras Eurostar international train station in collaboration with Border Force and Eurostar, and at a border guard training facility at Kętrzyn, Poland, in collaboration with the Polish Border Guard [S6].
Prof. Ferryman directly engaged with the UK Home Office: Border Force and Her Majesty’s Passport Office (HMPO), and the Cabinet Office, as key stakeholders. The Cross-Government Border Delivery Group on Future Borders in the Cabinet Office stated that “ *Overall the PROTECT outcomes have contributed to implementation of the 2025 UK Border Strategy.*” The 2025 UK Border Strategy is Her Majesty’s Government’s exploration on how new digital systems can improve trader and traveller experience at the border and make the UK more secure [S3]. These new digital systems include a contactless travel model, digital travel credentials and reduction in transaction times. The Cabinet Office stated that PROTECT has shown how “ the transaction time at the border can be reduced to zero for a fully contactless control”, enabling “ significant improvement in the flow of low risk travellers through the border”, that PROTECT increases deployment of ABC in sea and land borders, over existing use at airports only, and that PROTECT “ supports health protection in a post-pandemic environment” [S3]. HMPO stated that that PROTECT has provided them with “ an appreciation of the infrastructure [supporting digital travel credentials] which would be required” [S4] with the Cabinet Office adding that PROTECT “ enables a reduction in the need for fixed and costly infrastructure as used in existing control systems.” [S3].
Outside of the UK, the Polish Border Guard stated that the PROTECT is a solution to “ further improve the border traffic control process at the external borders of the European Union” [S5]. Specifically, it “provides the Polish Border Guard with a high level of biometric checks which helps them to deal with increasing vehicle flows. This means border guard experts can focus on high risk passengers, whereas low-risk passengers can go through the border control process smoothly and quickly” [S6], a view similarly endorsed by the Cabinet Office [S3]. Finally, Professor Ferryman has produced a white paper on the outcomes of PROTECT which has directly informed FRONTEX [S7]. Overall, the PROTECT outcomes have been validated by significant authorities with direct responsibility for implementing future border control systems.
Impact on commercial sector: PROTECT’s main industrial partner, Veridos GmbH, is a world-leading provider of identity solutions. PROTECT has been a key driver in formation of a new organisational unit within Veridos focussing on document verification, ABC, self-service enrolment kiosks, non-stop access control solutions and non-stop traveller verification. PROTECT has directly led to the development and deployment of new products (VeriGO® eAccess portal and of VeriGO® eVisa mobile app **), exploiting PROTECT RFID (radio-frequency identification) and NFC (near field communication) innovations, and RFIs with customers for deployment of PROTECT sUHF (secure ultra high frequency) access control solutions beyond border security, for critical infrastructure protection. PROTECT directly led to Veridos’ decision to invest in, and lead on, a new competitively won €7m EU research and innovation project D4fly, which directly exploits the PROTECT innovations in free-flowing biometric identification and integrates them into the wider identity lifecycle of a traveller [S8].
Impact on policy, data protection and legislation makers: The EC stated in their policy impact assessment of the PROTECT project: “ PROTECT provides the proof of concept of a new way to control border crossing using multiple new technologies, in particular using multiple biometrics. The faster pre-enrolment using modern solutions like the biometric corridor could speed up the border crossing process. The multi-factor identification process will also improve the security and efficiency of the border control process.” [S9a]. Ferryman quickly established that such innovations were ahead of current EU law in border security – they simply could not be implemented today. This is because the use of technologies such as smartphones and next-generation multimodal passports (incorporating additional biometrics beyond face and fingerprint) in the border control process is not in accordance with the current legal framework for border crossings in the EU (the Schengen Border Code). These findings, which were welcomed by the EC, specifically led to the EC inviting Ferryman to present to policymakers in Brussels in November 2018, to establish changes in legislation needed to adopt PROTECT’s innovations. With respect to wider political debate, Ferryman contributed to two Civil Service World round table discussions ( first and second) on how PROTECT’s innovations will change immigration and security systems for borders of the future.
PROTECT has given careful consideration to the implications of biometrics in terms of legislation, citizens' and residents' rights and freedom of movement. The PROTECT team has specifically sought ways to empower the public, putting them in a situation whereby they can see and understand their own personal data and how it is being used – especially in relation to development and use of a PROTECT smartphone app. The Information Commissioner’s Office (ICO) noted that PROTECT’s work, particularly on development of methods to mitigate potential data protection issues, has positively “ informed their work,” “ extended their expertise,” and overall has “ contributed to an increase in ICO’s corporate level of understanding” [S9b].
Impact on standards: According to the British Standards Institute (BSI), PROTECT has “ emphatically impacted international standardisation efforts [in contactless and frictionless forms of biometric identification], and the biometrics industry more generally” [S10]. Specifically, Ferryman has informed the scope and content of a new international ISO standard (ISO/IEC WD TS 22604) on ‘biometric recognition of subjects in motion in access related systems’ [S10], providing guidance to practitioners on the use of biometric-recognition-in-motion technologies. Further, PROTECT’s innovations have contributed to both the International Civil Aviation Organisation (ICAO)’s Logical Data Structure (LDS) 2.0 – the next evolution of ePassport standards – and ICAO’s Digital Travel Credentials (DTC) – which temporarily or permanently substitute a conventional passport with a digital representation of the traveller’s identity, two separate ICAO topics. Specifically, PROTECT contributed to several ICAO technical reports on LDS 2.0 (which form part of the upcoming 8th edition of the ICAO Doc 9303 on Machine Readable Travel Documents) and DTC [S8].
Summary: The ambitious ground-breaking research and strong engagement with stakeholders has led to a world-leading transformative traveller identification system. PROTECT demonstrably resolved intractable technical and acceptability barriers to faster and safer border crossings. This has resulted in the worldwide border security industry and related policy sphere dramatically shifting their position on seamless and secure travel and aligning future priorities. PROTECT has informed and delivered on the 2025 UK Border Strategy and been positively validated by border authority practitioners in 4 EU Member States. There have been direct benefits for industry, for policymakers and legislators, and on development of new international standards in biometrics and digital travel credentials. PROTECT is currently being taken up in follow-on EU funded research which targets the countering of emerging threats in the whole traveller identity lifecycle.
5. Sources to corroborate the impact
(a) FastPass final project report, 2017; (b) Hauts-de-France deployment, October 2020
Recommendations for future ABC installations, 2017
Testimonials from UK Border Force (Home Office), January 2021 and Cabinet Office, February 2021
Testimonial from HMPO (Home Office), December 2020
Testimonial from Polish Border Guard, October 2020
White paper on ‘Secure and Seamless Travel: The PROTECT project’, 2020
Testimonials from Veridos, June and December 2020
(a) Impact excerpt from PROTECT final review report (European Commission), 2019(b) Testimonial from Information Commissioners Office, June 2020
Testimonial from British Standards Institute, December 2020
- Submitting institution
- The University of Reading
- Unit of assessment
- 11 - Computer Science and Informatics
- Summary impact type
- Environmental
- Is this case study continued from a case study submitted in 2014?
- No
1. Summary of the impact
Cross-disciplinary environmental simulation and earth observation communities require prohibitively large amounts of heterogeneous data, beyond the level sensible to replicate and manage at most institutions. Research at Reading has addressed these issues by developing technologies to handle millions of files and petabytes of data. These have made it possible for the UK to be world leaders in delivering large-scale environmental data analytics. The resulting technologies underpin one of the world’s largest multi-petabyte online environmental data archive, CEDA (the Centre for Environmental Data Analysis) hosted on a unique computing facility, JASMIN – a world-leading computational facility that delivers a petascale analytical environment (a ‘data commons’). Experience with JASMIN resulted in: the development of new computer systems at the UK Met Office; improved commercial exploitation of satellite data; and the use of the global Earth System Grid Federation to support the Intergovernmental Panel on Climate Change (IPCC). All these activities led, and lead, directly to climate science outcomes of relevance to our society.
2. Underpinning research
In the early 2000s, the computing and data centre environment consisted of data centre silos decoupled from serious computing. Research carried out under the auspices of the UK e-science programme started to address the problems which resulted, such as the plethora of data types arising from different communities, the location of data, delivering appropriate compute platforms, and handling provenance in the face of millions of files and, eventually, petabytes of data. This work provided the foundations for the developments below.
Data services which need to be interoperable between communities and countries require common understandings of the data: both in terms of inherent meaning and methods of digital encoding. The climate forecast (CF) conventions ( https://cfconventions.org) have been developed over two decades to address these problems. The University of Reading team’s major research contribution was to provide a formal understanding of how these conventions could be coded in an interoperable way via a data model and software implementation [R3.1]. The data model and tools allow software engineers to validate their software against expected outcomes and supports precision in the CF specification.
Early experience with grids led to the Reading researchers’ recognition that working towards a single solution was not the right approach, and that there were two separate use cases which could be better resolved with the provision of separate, but complementary, solutions. These two use cases, termed ‘common communities, distributed data’, and ‘disparate communities with some common data’, required independent solutions, the development of which form the body of this case.
Common communities, distributed data:
This problem was dealt with first, by co-developing (in a larger international consortium) the Earth System Grid Federation (ESGF), which provides distributed management of climate data to support discovery and download services [R3.2]. The ESGF depends on harvesting metadata from data formatted according to an extension of the CF conventions, distributing that metadata, and then using that harvested metadata in multiple portals providing data discovery via faceted browse. This includes user authentication and authorisation, allowing data download from data wherever it is held - whether locally or further afield. The underlying use case for the ESGF was to support coupled, model-intercomparison projects (CMIPs) – which involve modelling groups worldwide that are carrying out specific numerical simulations and producing specific data for intercomparison. The CMIP “Data Reference Request’ [R3.3] and the ‘Data Reference Syntax’ ( online) were key inputs to the process, providing scientists with guidance on what was required and how to format it for the ESGF (and further intercomparison). With European Commission support [G3.1], an ontology for providing data provenance for numerical simulations, [R3.4] was developed. This work provides the first comprehensive data model for complex numerical simulation workflows in environmental science, and the first ontological descriptions of numerical experiments prior to their execution. Additional work at the University of Reading exploiting a decade of investment using NERC national capability funding [G3.2] and targeting understanding and supporting the climate forecast conventions for the NetCDF data format, culminated in a formal data model for ‘CF compliant archive metadata’ along with a comprehensive Python implementation [R3.1]. All three activities (data reference syntax, simulation ontology and CF data model with python implementation) exploited fundamental concepts in data modelling, applied in new applications.
Disparate communities with (some) common data:
The CEDA holds over 13PB of environmental data, organised into over 2,000 datasets in 300 dataset collections. There are hundreds of parameters, including aircraft campaigns, satellites, radars, automatic weather stations, climate models, and more. The research challenge is making these data discoverable, accessible, and usable alongside other user-centric sources of data. Reading research to address these issues included improved cataloguing systems [R3.5] and architecting and implementing JASMIN [R3.6]. JASMIN is a unique datacentric computing platform which provides a range of storage technologies and customisable computing.
[R3.5] exploited a taxonomy and concepts developed during the e-Science era to create a new cataloguing system, capable of providing more than just traditional data discovery by providing a browse-based method of moving between data descriptors. The Reading research explored several implementations using a range of technologies before setting on the database structures described in [R3.5] – which are still in use today and underpin all the various views of more than 50,000 different metadata artefacts. With PB distributed in hundreds of millions of files, the catalogue system is integral to the use of JASMIN, as no available file system supports performant browsing at this scale.
The new JASMIN architecture solves the ‘disparate communities with common data’ problem by providing software, platform, infrastructure, and data as a service for a range of communities. JASMIN was shaped by e-science experience, and a sustained programme of research [G3.1], [G3.2], [G3.3]. It was the first large computing system designed primarily for petascale data analysis that included a curated petascale archive alongside petascale compute resources. Over the decade since JASMIN was first commissioned, it has progressively included more cloud computing capability and new storage technologies (twice deploying the world’s largest pools of new storage technologies, based on proof-of-concept research carried out by the JASMIN team). The ‘Software-as-a-service’ work developed many different methods and models for allowing users to manipulate data ‘server-side’ (e.g. [G3.4]), the most recent of which is the European Space Agency (ESA) Open Data Portal [R3.7] which utilises the JASMIN cloud.
3. References to the research
The research resulted from sustained national capability and external competitive funding; all except [R3.5] were published in peer-reviewed journals ([R3.5] appears in a peer-reviewed journal, but in a non-reviewed section). The research meets and exceeds 2* quality level definitions through defining and implementing new processes, techniques and methodologies for dealing with and managing large-scale environmental data and introducing fundamental new ideas for describing simulation requirements and properties.
Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., & Taylor, K. E. (2017). ‘A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)’. Geoscientific Model Development, 10(12), 4619–4646. DOI: https://doi.org/10.5194/gmd-10-4619-2017
Balaji, V., Taylor, K. E., Juckes, M., Lawrence, B. N., Durack, P. J., Lautenschlager, M., Blanton, C., Cinquini, L., Denvil, S., Elkington, M., Guglielmo, F., Guilyardi, E., Hassell, D., Kharin, S., Kindermann, S., Nikonov, S., Radhakrishnan, A., Stockhause, M., Weigel, T., & Williams, D. (2018). ‘Requirements for a global data infrastructure in support of CMIP6’. Geoscientific Model Development, 11, 3659–3680. DOI: https://doi.org/10.5194/gmd-11-3659-2018
Juckes, M., Taylor, K. E., Durack, P. J., Lawrence, B., Mizielinski, M. S., Pamment, A., Peterschmitt, J.-Y., Rixen, M., & Sénési, S. (2020). ‘The CMIP6 Data Request (DREQ, version 01.00.31)’. Geoscientific Model Development, 13(1), 201–224. DOI: https://doi.org/10.5194/gmd-13-201-2020
Pascoe, C., Lawrence, B. N., Guilyardi, E., Juckes, M., & Taylor, K. E. (2020). ‘Documenting numerical experiments in support of the Coupled Model Intercomparison Project Phase 6 (CMIP6)’. Geoscientific Model Development, 13(5), 2149–2167. DOI: https://doi.org/10.5194/gmd-13-2149-2020
Parton, G. A., Donegan, S., Pascoe, S., Stephens, A., Ventouras, S., & Lawrence, B. N. (2015). ‘MOLES3: Implementing an ISO standards driven data catalogue’. International Journal of Digital Curation. 10(1). DOI: https://doi.org/10.2218/ijdc.v10i1.365
Lawrence, B.N., Bennett, V. L., Churchill, J., Juckes, M., Kershaw, P., Pascoe, S., Pepler, S., Pritchard, M., & Stephens, A. (2013). ‘Storing and manipulating environmental big data with JASMIN’. 2013 IEEE International Conference on Big Data, 68–75. DOI: https://doi.org/10.1109/BigData.2013.6691556
Kershaw, P., Halsall, K., Lawrence, B. N., Bennett, V., Donegan, S., Iwi, A., Juckes, M., Pechorro, E., Petrie, R., Singleton, J., Stephens, A., Waterfall, A., Wilson, A., & Wood, A. (2020). ‘Developing an Open Data Portal for the ESA Climate Change Initiative’. Data Science Journal, 19, 16. DOI: https://doi.org/10.5334/dsj-2020-016
Key Projects/Grants (all with Lawrence as the principal investigator):
Infrastructure for the European Network for Earth System Modelling (ENES) series of grants from the European Commission: Metafor (between 2008 and2011), IS-ENES2 (2013-2017), IS-ENES3 (2019-2022).
- Supported the ontology work and now supports the Climate Forecast conventions (CF).
NCAS Computational Model Services (Annual NERC National Capability Contract via the University of Leeds, between 2012 and 2018; from 2019 subsumed into the overall NCAS contract).
- Delivered the support for JASMIN R&D (JASMIN operations funded at STFC). Supported the CF data model work prior to EC Funding.
European Centre of Excellence in Weather and Climate Computing (European Commission): ESiWACE (between 2015 and 2019), ESiWACE2 (2019-2022).
- Supported work on new storage systems and data access software.
Copernicus Programme: C3S-MAGIC (between 2016 and 2017).
- Supported work on ‘climate diagnostics’ that can be run ‘server-side’ as part of a climate toolbox (in the Copernicus Climate Store).
Met Office Contracts: ‘HPC QA for the Met Office Supercomputing Procurement’ (between 2014 and 2015, Contract H514900). ‘HPC Support’ (2013-2019, Contract H5169100)
- Supported advice to the Met Office on their ‘JASMIN-like’ procurement (SPICE), and research on parallel data manipulation on JASMIN.
4. Details of the impact
Environmental science is heavily dependent on big data and has always been dependent on bleeding edge data handling technologies. The research described here has directly enabled science outcomes which would have otherwise been difficult, or even impossible, to achieve given data volumes and disparate communities. These outcomes exploited three separate activities: (i) the use of the JASMIN data commons, (ii) the use of the Earth System Grid Federation, and (iii) the re-use of metadata techniques and tools by third parties; all of which arose from the research summarised here.
JASMIN data commons usage: JASMIN is used by a range of environmental scientists who wish to either share data, or directly exploit the massive volume of data held in the archives of CEDA. CEDA itself is a key tenant of the JASMIN system, alongside a number of other organisations and individuals. The architecture and GBP20m implementation of the JASMIN was a primary research outcome – the operational implementation is now supported by annual budget in excess of GBP 1m via contributions from the National Centre for Atmospheric Science (NCAS), National Centre for Earth Observation (NCEO), the Natural Environment Research Council (NERC), and the UK Space Agency.
The JASMIN data commons depend on the data gravity asserted by the petascale CEDA archive: users are incentivised to bring their data alongside the archive and bring other users (and their data) for collaborations. In excess of 1,500 users exploit JASMIN directly, with approximately 25,000 additional users exploiting CEDA services hosted on JASMIN (Statistics, [S5.1]). The 1,500 direct users are organised into more than 250 distinct groups (tenancies) sharing data and or compute resources for data analysis – users can, and do, belong to multiple groups. JASMIN is now an integral part of UK and global science, yielding a wide variety of outcomes. Representative key JASMIN usage examples are listed below (but see also [S5.2]):
The ESA climate change portal ( http://cci.esa.int/data) is deployed in the JASMIN cloud, making use of metadata and data distribution technologies which have evolved from the original DataGrid work [R3.7].
JASMIN is providing the underpinning services for UKSA contingency planning for retaining access to EC Copernicus data post-Brexit and for the commercial activities of the Science and Technology Facilities Council’s RALSpace and others [S5.3].
The production of unique rainfall estimates and insurance products for over three million African farmers are now predicated on climate services deployed on JASMIN [S5.4].
The dominant mode of analysis for UK CMIP6 work feeding into the sixth assessment report of the IPCC is the use of JASMIN, [S5.3] and [S5.5].
The most recent UK State of Nature Report, a key government indicator for guiding environmental policy, covering how biodiversity has changed from 1970 to 2015 analysed 12,000 species using over 34 million records on JASMIN [S5.6].
All these outcomes were predicated on the co-location of curated data, user data, and specialised computing which arose from the University of Reading research into ‘disparate communities with common data’.
Earth System Grid Federation (ESGF) and policy change: The ESGF provides the mechanisms for global management and distribution of data products produced by environmental groups worldwide – the ‘common community distributed data’ problem. In particular, it was designed to support the fifth and sixth global coupled model intercomparison projects – CMIP5 and CMIP6 – which themselves were timed to support the fifth and sixth assessment reports of the IPCC. The IPCC is a UN entity created to provide policymakers with regular scientific assessments on climate change, its implications and potential future risks, as well as to put forward adaptation and mitigation options. Those responsible for the fourth assessment report of the IPCC shared in the 2007 Nobel Peace Prize.
As of July 2020, the ESGF hosts 28 different projects with 21 petabytes of data. These are from 7.3 million datasets, distributed across 15 data nodes, which provide data downloads to 140 countries [S5.7]. The UK ESGF data nodes are deployed by CEDA and hosted on JASMIN, one cataloguing UK data for third-party download, and one hosting third-party data which has been replicated to JASMIN to avoid multiple petascale downloads to the UK. The research described here was integral to the initial architecture and deployment in CMIP5 [S5.3].
Much of the ESGF data is simulation data, used directly by academics, climate change consultants, and indirectly by those making commercial and policy decisions. All can find it difficult to understand the difference between different simulations. The Reading research on simulation provenance [R3.4] provided technologies to support the creation, extraction, and comparison of simulation documentation directly addressing this issue. As a consequence, the Reading researchers developed new systems and all modelling groups have been mandated [R3.2] to use them to construct simulation documentation for the sixth global model intercomparison project (CMIP6). (The mandate is described in [R3.3] and further evidenced in [S5.5]).
Reuse: Two examples are presented: (1) how the climate forecast (CF) metadata conventions [R3.1] are integral to weather forecasting and earth observation [S5.8]; and (2) how the JASMIN concept was copied and implemented by the Met Office to provide them with internal capability (SPICE) which matched the JASMIN provision to the wider UK community [S5.9].
The importance of the CF conventions to applications in weather and climate is well known and accepted, but prior to the advent of Reading’s data model, they were not very interoperable with other systems. With this data model in place, the World Meteorological Office is now exploring the development of new regulations to encourage such interoperability. Whether or not these are put in place, the use of CF and the data model is now changing information transfer in global weather forecasting (on top of its integral role in climate model intercomparison). CF is now also becoming integral to earth observation, allowing major satellite data providers to lower costs and make data available to more users. The Reading team’s data model has been a key part in making data products accessible to more of the users of Europe’s largest provider of satellite based meteorological data, EUMETSAT [S5.8].
JASMIN was the first environmental ‘‘super data computer’’, showing the benefit of centralising high-volume data with dedicated high-performance analysis compute. The existence of JASMIN has enabled international research collaborations like PRIMAVERA (an EU Horizon 2020 research project involving the University of Reading and 19 European partners with the aim of developing a new generation of advanced global climate models for the benefit of governments, business and society) [S5.10]. What is more, JASMIN has sped up research workflows to the extent that some tasks which previously would have taken years, now take days. This is allowing some people to have, and test, as many ideas in a year, as they could have done in their entire career with traditional systems. The impact on research science was significant enough that the Met Office designed and procured their own ‘‘mini-JASMIN’’ called SPICE (Scientific Processing and Intensive Compute Environment) – contracting Lawrence to provide advice and quality assurance [G3.5]. SPICE now underpins Met Office research across both weather and climate [S5.9].
Summary: Large-scale environmental science depends on vast volumes (petabytes) of data, typically in millions of files. The research carried out at Reading on techniques for managing and manipulating data at this scale (on vocabularies and tools, the earth system grid federation, and the design and implementation of JASMIN - a computational facility that delivers a ground-breaking petascale analytical environment) has been essential to the work of multi-national agencies (for example, EUMETSAT). It has underpinned the research that led to delivery of the UK climate impact projections, the Paris Agreement of the UN Framework for Climate Change [S5.5], and the research that will underpin the next assessment of the IPCC. The next decade will see new challenges in moving from petabytes to exabytes, necessitating further developments in tools and information systems.
5. Sources to corroborate the impact
(i) CEDA/JASMIN statistics via annual reports at https://www.ceda.ac.uk/about/highlights/ and (ii) https://manage.jasmin.ac.uk/projects/
JASMIN Science Case http://cedadocs.ceda.ac.uk/1350/1/JASMIN_Science_Case.pdf
Testimonial from Director of RAL Space, July 2020
Testimonial from Director of TAMSAT, September 2020
World Climate Research Programme (WCRP), Testimonial from Head of Understanding Climate Change, Met Office, July 2020
Using JASMIN for the largest ever UK wildlife assessment https://www.ceda.ac.uk/blog/using-jasmin-for-the-largest-ever-uk-wildlife-assessment/
ESGF statistics http://esgf-ui.cmcc.it/esgf-dashboard-ui/.
Testimonial from EUMETSTAT, June 2020
Testimonial from Met Office re SPICE, July 2020
Testimonial from Royal Netherlands Meteorological Institute re PRIMAVERA, July 2020