
Consensus Study Report
NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by a contract between the National Academy of Sciences and Open Philanthropy. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-72666-5
Digital Object Identifier: https://doi.org/10.17226/27970
This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242; http://nap.nationalacademies.org.
The manufacturer’s authorized representative in the European Union for product safety is Authorised Rep Compliance Ltd., Ground Floor, 71 Lower Baggot Street, Dublin D02 P593 Ireland; www.arccompliance.com.
Copyright 2025 by the National Academy of Sciences. National Academies of Sciences, Engineering, and Medicine and National Academies Press and the graphical logos for each are all trademarks of the National Academy of Sciences. All rights reserved.
Printed in the United States of America.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2025. Machine Learning for Safety-Critical Applications: Opportunities, Challenges, and a Research Agenda. Washington, DC: National Academies Press. https://doi.org/10.17226/27970.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. Tsu-Jae Liu is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review ss and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
Rapid Expert Consultations published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
GEORGE PAPPAS (NAE), University of Pennsylvania, Chair
YIRAN CHEN, Duke University
WERNER DAMM, University of Oldenburg (resigned on July 11, 2024)
THOMAS DIETTERICH, Oregon State University (Emeritus)
MATTHEW GASTON, Carnegie Mellon University
ANOUCK GIRARD, University of Michigan
ROBERT GRIFFIN, IBM
JONATHAN P. HOW (NAE), Massachusetts Institute of Technology
ASHLEY LLORENS, Microsoft
LYDIA TAPIA, University of New Mexico
AIDONG ZHANG, University of Virginia
THƠ H. NGUYỄN, Senior Program Officer, Study Director
JON K. EISENBERG, Senior Board Director
GABRIELLE M. RISICA, Program Officer
NNEKA A. UDEAGBALA, Associate Program Officer
SHENAE A. BRADLEY, Administrative Coordinator
LAURA HAAS (NAE), University of Massachusetts Amherst, Chair
DAVID DANKS, University of California, San Diego
CHARLES ISBELL, University of Wisconsin–Madison
ECE KAMAR, Microsoft Research Redmond
JAMES F. KUROSE (NAE), University of Massachusetts Amherst
DAVID LUEBKE, NVIDIA Corporation
DAWN MEYERRIECKS, The MITRE Corporation
WILLIAM SCHERLIS, Carnegie Mellon University
HENNING SCHULZRINNE, Columbia University
NAMBIRAJAN SESHADRI (NAE), University of California, San Diego
KENNETH E. WASHINGTON (NAE), Medtronic, Inc.
JON K. EISENBERG, Senior Board Director
THƠ H. NGUYỄN, Senior Program Officer
GABRIELLE M. RISICA, Program Officer
NNEKA A. UDEAGBALA, Associate Program Officer
SHENAE A. BRADLEY, Administrative Coordinator
AARYA SHRESTHA, Senior Financial Business Partner
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their review of this report:
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by BRYNA KRA (NAS), Northwestern University, and BARBARA GROSZ (NAE), Harvard University. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.
1 ENGINEERING SAFETY-CRITICAL SYSTEMS IN THE AGE OF MACHINE LEARNING
1.2 Emergence of Machine Learning in Safety-Critical Systems
2 STATE OF THE ART, PROMISES, AND RISKS OF MACHINE LEARNING
2.1 Emerging Machine Learning–Enabled Capabilities
2.2 Machine Learning in Intelligent Infrastructure
2.3 Machine Learning in Health Care
2.4 Machine Learning in Manufacturing
2.5 Machine Learning in Automotive Systems
2.6 Emerging Risks for Machine Learning in Safety-Critical Applications
3 SYSTEM ENGINEERING WITH MACHINE LEARNING COMPONENTS FOR SAFETY-CRITICAL APPLICATIONS
3.1 State of Practice in Safety-Critical System Design
3.3 Integrating Machine Learning into Safety Engineering
4 A RESEARCH AGENDA TO BRIDGE MACHINE LEARNING AND SAFETY ENGINEERING
4.1 Data Engineering Tools and Techniques
4.2 Benchmark Data Sets for Safety-Critical Machine Learning
4.3 Learning Algorithms and Learning Theory for Safety-Critical Machine Learning
Artificial intelligence (AI), enabled by machine learning (ML) algorithms, has been hailed as one of the most transformative advances this century. ML capabilities are finding applications and impact across societal sectors, enabling new capabilities in critical infrastructure, transportation, health care, and more. They are being integrated into physical systems where ML-enabled components directly inform system actions, even as well-known shortcomings such as lacking systematicity, consistency, and explainability remain unsolved—or raise new challenges as larger and more complex models are used. Thus, questions about performance, reliability, and trustworthiness are front and center—especially when ML is used in a safety-critical application.
Safe deployment of ML is a national priority. The National Institute of Standards and Technology (NIST) developed a taxonomy of AI risks that characterizes attributes of trustworthy AI.1 This work contributed to NIST’s AI Risk Management Framework, which describes strategies for identifying and adopting risk mitigation into the AI development life cycle.2
With support from the philanthropic foundation Open Philanthropy, the National Academies of Sciences, Engineering, and Medicine convened the Committee on Using Machine Learning in Safety-Critical Applications: Setting a Research Agenda to study the state of ML in
___________________
1 National Institute of Standards and Technology (NIST), 2021, “Taxonomy of AI Risk,” https://www.nist.gov/system/files/documents/2021/10/15/taxonomy_AI_risks.pdf.
2 NIST, 2023, “Artificial Intelligence Risk Management Framework (AI RMF 1.0),” https://doi.org/10.6028/NIST.AI.100-1.
safety-critical applications and identify key gaps, ranging from technical capabilities to culture and practices, between ML development and safety engineering, and identify a research agenda to bridge these gaps (Appendix A provides the full statement of task). The committee engaged with leading researchers from industry and academia, automation experts, and safety professionals to understand both progress made and progress to be done; Appendix B lists briefings received.
To address the statement of task, the committee defined safety in the context of safety-critical systems, identified necessary new challenges and considerations for developing metrics to assess safety, and explained why and what new perspectives are needed to evaluate robustness for cyber-physical systems using ML components. A research agenda is also identified to bridge key gaps toward the integration of ML in safety-critical systems.
Note that a key consideration of safety-critical systems is the integral role of the human for whom safety concerns are considered. The human plays a complex role that ranges from passive user (e.g., passenger or bystander) to controller (e.g., pilot). Safety performance of a system is also tied to the decisions made by the human developer or system operator. While the role of the human is an important dimension to safety, it deserves deep and broad treatments that lie outside the scope of this study. Furthermore, while ML and AI (cyber) security overlaps with safety, it is another rich topic that deserves its own investigation and is outside the scope of this effort.
The goal and intended audience for this report is to put forth an agenda for the research community—both in industry and academia. Additionally, the committee offers its findings in the hopes of helping governments, standards-setting bodies, companies and industry groups, and public interest groups that grapple with challenges of using ML in safety-critical systems. Close collaboration and coordination among these groups as regulations, standards, and best practices are developed will be essential in improving safety and building trust in ML-enabled systems.