Previous Chapter: 3 The Potential for Use of Artificial Intelligence in Radiation Health Fields
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

4
Data for Artificial Intelligence Readiness

A session devoted to discussing the importance of data collection, curation, and management for use in artificial intelligence (AI) using examples from medical imaging as a focus to discuss this topic. This session aimed to address the aspect of the statement of task on intentionality of data collection, detector development, and dataset management for future AI and machine learning (ML) end algorithm applications. The session was moderated by Anyi Li, chief of computer service at Memorial Sloan Kettering Cancer Center, and Leo Chiang, senior research and development fellow at the Dow Chemical Company.

DATA FOR AI: THE CRITICAL ROLE OF METADATA AND CONTEXT

Caroline Chung, vice president and chief data and analytics officer, co-director of the Institute for Data Science and Oncology, and professor of radiation oncology and diagnostic imaging at the University of Texas MD Anderson Cancer Center, began the session by talking about the critical role that metadata and context play in data used for AI. Many people think about data readiness in terms of perfectly cleaned data, she said, but that sort of data may not be required “as long as you consider the context of how you’re wanting to use the data.”

The development of AI tools in healthcare faces a fundamental challenge that extends far beyond algorithmic sophistication. As Chung emphasized in her presentation, a significant “AI chasm” exists—that is, the gulf between developing scientifically sound algorithms and their meaningful real-world applications, as described by Keane and Topol (2018). Context, she argued, provides the critical bridge to span this chasm, transforming data science tools from laboratory curiosities into practical healthcare solutions.

She offered that the path to successful AI implementation begins with a fundamental shift in approach. Rather than asking what can be built with available data, Chung suggested starting with clearly defined problems that need to be solved. This problem-first methodology prevents the creation of sophisticated but ultimately useless “widgets” that may perform well in controlled settings but lack practical utility in clinical environments. The availability of training data does not guarantee the availability of similar data in real-world deployment, making context-aware planning essential from conception through implementation, she said. Three considerations can help guide this process: first, ensuring that model training and validation align with actual clinical needs; second, assessing stakeholder and operational readiness for implementation; and third, guaranteeing that model outputs can be communicated effectively to end users in ways they understand and trust.

Healthcare AI increasingly relies on integrating diverse data types—imaging, omics, metabolites, physiological measurements, and environmental factors. However, this integration presents significant contextual

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

challenges that can undermine model validity. Temporal context is likely crucial: The timing of a biopsy versus omics data collection can render associations meaningless if the timescales are too separated. Similarly, spatial context matters enormously—without knowing the exact location within a tumor where a specimen was obtained, researchers risk making unfounded assumptions about the relationship between imaging findings and molecular profiles.

As Chung noted in her 2021 Cancer Research publication (Chung and Jaffray, 2021), metadata serve as the essential “glue” that brings observations together meaningfully. Without proper context and metadata, even rich datasets can become significantly less valuable and can lead to erroneous conclusions.

The practical implications of inadequate contextual awareness extend beyond theoretical concerns into direct patient care impacts. Chung’s examination of brain magnetic resonance imaging (MRI) data from the National Cancer Institute’s (NCI’s) National Clinical Trials Network revealed imaging slice thicknesses ranging from 1mm to 5mm, with most using 4mm slices (Thrower et al., 2021). This variation has profound consequences: tumors visible at 1mm resolution can “disappear” when reconstructed with 3mm slices. A patient transferring between centers using different imaging protocols might suddenly appear to have stage 4 disease, potentially leading to inappropriate clinical trial enrollment based on imaging artifacts rather than disease progression.

The COVID–19 pandemic provided another stark illustration of contextual blindness. Early AI models trained to detect COVID–19 from medical images inadvertently learned to distinguish between pediatric and adult lungs rather than positive and negative cases, because most COVID–19-negative patients were children while COVID-19-positive cases were predominantly adults (Heaven, 2021). The failure to account for age metadata rendered these models fundamentally flawed.

She discussed how electronic health records (EHRs) present additional contextual challenges. Research by Steinkamp and colleagues (2022) revealed extensive copying and pasting in EHRs, propagating duplicate information and errors throughout patient records. While clinicians recognize and work around these inconsistencies through conversations and clinical judgment, data scientists accessing only the recorded information miss those conversations that inform actual medical decisions. This gap between documented and actual clinical reasoning creates significant challenges for AI model development.

Chung explained that effective healthcare AI likely involves dynamic data quality assessment based on intended use. She stated that perfect data are not always necessary—the required quality depends on the specific application. Exploratory research seeking general patterns can tolerate some noise, she elaborated, while clinical prediction models for individual patients necessitate high-quality inputs to minimize uncertainty in life-affecting decisions.

She discussed how data provenance may then be equally critical for building trust and reliability. Users need to understand data origins, cleaning processes, and any transformations applied. Chung described a cautionary example where a published dataset contained significant amounts of imputed data, but hundreds of researchers unknowingly used these artificial data because their provenance was not documented.

Chung stated that less obvious sources of bias may also be considered. Scanning frequency differences can create detection bias—patients scanned every 2 months will show progression earlier than those scanned every 6 months, potentially skewing model predictions. Measurement error bias from scanner variations, protocol differences, and patient cooperation adds additional layers of complexity that context-aware models aim to address.

Chung argued that rather than simplifying or ignoring healthcare’s inherent complexity, it can be embraced as a source of deeper understanding. This approach involves considering both content (the data themselves) and context (the metadata) to enable meaningful data curation and generate genuine insights. Success likely includes starting with clearly defined purposes, building appropriate teams, assembling contextually rich datasets, and maintaining awareness of context throughout development and implementation.

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

The path forward likely involves robust verification, validation, and uncertainty quantification that considers both data and model limitations. Through this comprehensive approach, healthcare AI systems can build the trust and confidence to inform critical medical decisions. As healthcare continues to generate increasingly complex and diverse datasets, the ability to maintain and leverage contextual understanding will determine which AI tools successfully bridge the chasm between laboratory promise and clinical reality.

She ended with an example of how the integration of emerging technologies like ambient listening may help fill some contextual gaps, but the fundamental challenge remains: creating AI systems that understand not just what the data say but what they mean within the full complexity of healthcare delivery. This contextual awareness, supported by appropriate data governance systems ensuring proper access and attribution, forms the foundation upon which truly useful healthcare AI can be built.

REQUISITES AND CHALLENGES IN QUANTITATIVE IMAGING

Dan Sullivan, professor emeritus at the Duke University Medical Center, spoke about how to interpret quantitative imaging properly. In particular, he explained, while most of the symposium’s speakers had talked about the data in large collections of scans, he focused on the issue of the data within a single image.

Sullivan started by stating that radiology has grappled with a fundamental problem for nearly a century: the persistent variability in radiologists’ subjective interpretations of images across all imaging modalities and clinical applications. As Sullivan emphasized, while little can be done to improve subjective interpretations themselves, the advent of digital imaging has created unprecedented opportunities to extract objective, consistent information from medical images. This shift from subjective interpretation to quantitative measurement represents a critical evolution in medical imaging that could transform diagnostic accuracy and clinical decision making.

The magnitude of interpretive inconsistency in radiology is both striking and clinically consequential, he said. Sullivan highlighted a study by Herzog and colleagues (2017) that exemplifies this challenge. An orthopedic surgeon had a 63-year-old back pain patient receive spine MRI studies every other day for 3 weeks at 10 different New York City hospitals, revealing that no interpretive finding was reported unanimously by all radiologists and one-third of findings appeared only once. This variability indicated significant false negatives and false positives, contradicting expectations that patients should receive consistent radiologic diagnoses regardless of when and where they seek care.

Sullivan argued that if these radiologists had access to AI algorithms providing objective, quantitative information extracted from the images, the reports would likely have shown far less variation in observations and conclusions. While such AI technology is technically feasible today, its adoption remains limited, primarily due to the lack of sufficiently large training datasets for robust AI implementation in radiology.

He noted that a fundamental shift in perspective is required to fully leverage quantitative imaging potential. Sullivan emphasized that clinical images are not merely pictures but represent n-dimensional datasets where each pixel conveys information about physical or biological properties of matter. Unlike radiomics features, which are computer constructs without direct physical reality, quantitative imaging measurements correspond to actual biological and physical characteristics of tissues.

He stated that this distinction is important for understanding image interpretation as a data extraction process. When radiologists interpret images, they perform segmentation—that is, extracting subsets of overall image data, analogous to isolating and measuring specific portions of tissue. However, biological imaging inherently deals with blurry, noisy datasets. Even high-resolution scans with 1-cubic-millimeter voxels average signals from approximately 1 billion cells, creating inherent challenges in precise measurement.

Sullivan described that the clinical consequences of these measurement challenges become apparent in routine diagnostic tasks. For instance, detecting and measuring lung nodules on computed tomography (CT)

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

scans requires identifying which pixels define the nodule’s boundaries, particularly its edges. This determination directly affects size measurements, which carry significant clinical implications for patient management decisions.

Sullivan then presented on the Quantitative Imaging Biomarkers Alliance (QIBA), established by the Radiological Society of North America, which works to improve the value and practicality of quantitative imaging biomarkers (QIBs) by reducing measurement variability. QIBs provide objective quantities characterizing biological features, functioning as “imaging assays” analogous to laboratory blood tests.

Like laboratory assays, imaging assays require characterization of both technical performance—evaluated against reference objects of known characteristics—and clinical performance, determined through clinical trials. QIBA’s Metrology Working Group has focused on technical performance, emphasizing that no measurement result is complete without an accompanying statement of uncertainty—a principle often neglected in imaging data reporting.

Sullivan highlighted that the most critical technical parameters for QIBs are bias, linearity, and precision. Precision encompasses both repeatability (under identical conditions) and reproducibility (under varying conditions including different locations, operators, or measuring systems). For clinical decision making, reproducibility proves most relevant, although the required magnitude varies significantly with intended clinical use.

Sullivan noted that despite technical feasibility, quantitative imaging faces significant adoption barriers. Clinical reluctance persists even for mature, reliable QIBs, primarily because few clinical treatment decisions are actually driven by quantitative imaging results. While numerous papers suggest potential QIB applications, long-term validation data remain scarce, making clinicians appropriately cautious.

Broader resistance to standardization compounds these challenges. Inertia maintains existing practices without compelling reasons for change. Some resistance reflects unwillingness to accept external directives, while skepticism questions whether standardization truly improves outcomes. Professional concerns about ownership and credit create additional barriers, compounded by insufficient education for both radiologists and treating physicians.

He argued that regulatory gaps further complicate implementation, including the absence of national accreditation or certification programs ensuring quantitative imaging quality. These multifaceted challenges highlight the value of comprehensive solutions addressing technical, educational, regulatory, and cultural dimensions of medical practice.

Sullivan’s considerations for improving standardized quantitative imaging acceptance emphasize the importance of clear rationales based on published data or expert consensus where data are lacking. He stated that standards should reflect broad stakeholder input through open consensus processes and remain dynamic documents that undergo periodic revision and evolution.

He stated that the fundamental message emphasizes prioritizing and incentivizing reproducibility as essential for quantitative imaging success. Through rigorous attention to measurement precision and uncertainty quantification, quantitative imaging can fulfill its promise of transforming radiology from subjective interpretation to objective measurement, ultimately improving diagnostic accuracy and patient care outcomes.

Sullivan ended by remarking that this transformation likely involves sustained commitment to technical validation, clinical validation, education, and standardization efforts that address both the scientific and practical challenges facing quantitative imaging implementation in routine clinical practice.

CENTRALIZED IMAGING COLLABORATIONS FOR AI READINESS

Paul Kinahan, head of the Imaging Research Laboratory at the University of Washington Medical Center, spoke about the need for large amounts of well-curated data for use with AI algorithms and described one approach to collecting such data, which involves centralized imaging collaborations.

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

He began by noting that the Food and Drug Administration (FDA) has approved more than 1,000 AI algorithms for medical imaging and that more than three-quarters of those are for radiology uses. Why have so many imaging algorithms been developed and approved? One of the reasons, Kinahan said, is that AI has clearly demonstrated an ability to help in analyzing medical images. Another, he continued, is that there are established digital workflows and universal data formats—in particular, the Digital Imaging and Communications in Medicine (DICOM) standard for transfer and storage—for use with medical images, which generally are not present with EHRs and other forms of medical data.

However, for these AI algorithms to be as useful as possible, he continued, they require very large amounts of data—hundreds of thousands or even millions of images—for their development, training, and evaluation (Willemink et al., 2020). “Where are we going to get that?” he asked. It is hard, he said, because the types of data that are most useful for AI purposes are lacking—in particular, accessible, representative, and curated data. By “curated,” he meant data with the appropriate information attached—their provenance, context, and so on.

Many data are potentially available, he said, but most of them are in isolated imaging systems. Kinahan remarked that this works well for local clinical needs—for the diagnostic tasks at hand—but problems arise when researchers try to collect large amounts of data and try to make sense out of all of the different types of data, formats, and error bars, which may or may not be reported. There are also many obstacles to getting clinicians to share their data and images. “Sometimes it is a lack of motivation, like, ‘Why should we do this?’” Kinahan said. “Or even if we are inclined to do this, ‘how are we going to pay for it?’” And privacy concerns add to the challenge. The result, he said, is that many of the algorithms being developed are based on limited datasets and sample sizes.

For researchers using these imaging algorithms, the limited data available on which to train them lead to less-than-ideal results, Kinahan said. Several researchers have analyzed these failures and the obstacles to creating AI models with clinical utility (e.g., Heaven, 2021; Roberts et al., 2021), and they have noted that several things went wrong.

Kinahan stated that one issue was the poor quality of the data. Among the factors contributing to the poor data quality are mislabeled data, multiple unknown sources, duplicate data, no traceability, limited quality control, and lack of external validation. Other issues behind the poor performance of the AI algorithms, Kinahan said, included a lack of valid ground truth, a lack of communications between AI experts and biomedical experts, and problems with how representative the data were of the populations of interest.

Some large datasets of images do exist, Kinahan said. He assembled a list of some of the largest that had a track record, including the name of the database, whether it is free or paid, what modalities are included, whether the images are from multiple institutions or a single one, and the number of exams. An exam can contain anywhere from one image to thousands of images, depending on what the exam was, he noted.

Kinahan offered a few additional details on some of these datasets. For example, he said that the Cancer Imaging Archive is, in his opinion, an exemplar of curated data. “However,” he continued, “it is cancer-specific, so it does not have other diseases necessarily, although there may be some by happenstance.” The National Lung Screening Trial is extremely valuable, he said, but it only has CT images. And at least three companies sell access to data from multiple institutions and multiple modalities. Finally, of particular interest is the Advanced Research Projects Agency for Health (ARPA-H) INDEX project, which is under development specifically to serve as a resource for developers of AI imaging tools. The images in these collections represent just a small fraction of the available images, Kinahan noted. In 2023, he said, more than 600 million medical imaging procedures were performed at more than 12,000 imaging centers in the United States.

Kinahan then provided some details on the Medical Imaging and Data Resource Center (MIDRC). It was funded by the National Institute of Biomedical Imaging and Bioengineering and now also ARPA-H in response to the COVID-19 pandemic and the resulting recognition of a lack of easily accessible and usable imaging data. Kinahan is one of the principal investigators for the project, which is hosted by the University of Chicago.

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

The intent behind the establishment of MIDRC was, he said, to have “a curated set of data with provenance that’s available [so] you can get the general idea of the workflow.” To date, MIDRC has ingested more than half a million exams, he said, but it takes time, effort, and resources to process those, so they are not yet halfway through. Currently, about 190,000 exams are available for free. The images are curated in such a way that users can select the cohorts they are interested in, and the tool, Data Explorer, exists to help select data.

Kinahan noted that 20 percent of the data on MIDRC are permanently sequestered and not available to the public. These data are kept separate to allow for the testing or evaluation of algorithms, potentially for submission to FDA or other purposes, he said. The sequestered data are representative of the various populations in the full dataset. Furthermore, MIDRC has developed methods so that if people run tests on data and then come back to rerun the tests, they do not get the same data. “It is a randomly sampled with replacement process to be a statistically valid approach,” Kinahan said. This is only possible because of how the data are curated and maintained, he commented.

Unfortunately, MIDRC and the other centralized collections today do not, in general, provide enough data to support many of the potentially valuable AI applications, Kinahan said. He briefly described two other approaches that could provide larger amounts of data.

The first was indexing. This is when a platform points to data from other sources. MIDRC currently runs the Biomedical Data Fabric Imaging Hub, which indexes directly into the Cancer Imaging Archive data, the Stanford Center for Artificial Intelligence in Medicine & Imaging data, and NCI’s imaging data commons, which already holds almost 600,000 exams.

The second approach he mentioned was federated data sharing. In a federated data sharing system, the data are kept in the local institutions where they were generated, such as hospitals, while centralized model development takes place in a hub. “You can take the algorithms to train them there and update your models there,” Kinahan explained.

In closing, Kinahan offered several summary points: Centralized repositories are valuable, but more data would likely be beneficial. Options include centralized, indexed, federated, and hybrid repositories. Curation is a fundamental aspect, and that involves resources. And the sustainability of any approach is an key component.

DATA MANAGEMENT TOOLS AND STRATEGY FOR RESPONSIBLE IMAGING AI

Dan Marcus, director of the Computational Imaging Research Center at Washington University School of Medicine, spoke about data management tools and strategies for use with responsible imaging AI.

Marcus began with a quote from Nobel laureate Geoff Hinton. In 2017, Hinton declared, “I think that if you work as a radiologist, you’re like Wile E. Coyote in the cartoon. You’re already over the edge of the cliff, but you haven’t yet looked down . . . It is just completely obvious that in 5 years, deep learning is going to do better than radiologists.” While AI has indeed made remarkable progress—with deep learning outperforming radiologists on specific tasks and tools like TotalSegmentator segmenting hundreds of organs in minutes—significant challenges remain, Marcus said.

Marcus stated that despite the hype, substantial issues persist in imaging AI. A 2025 National Institute for Health and Care Excellence guidance stated that “more research is needed on the AI-derived software to analyze chest X-rays alongside clinician review” and warned that AI-derived software should not be used for clinical decision-making support, as some commercial software for identifying lung abnormalities could lead to missed lung cancers (NICE, 2025). Marcus discussed key challenges: insufficient evidence of effectiveness and reliability, insidious algorithmic bias, questions about model generalizability when deployed locally, and concerns about effects on radiologist behavior and workflow.

Over the past 20 years, Marcus has focused on how imaging informatics can address these issues through XNAT, an open-source data management platform. XNAT serves as a comprehensive solution for managing

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

and organizing workflows—bringing data in from sources, moving them through controlled workflows, and eventually making them publicly available.

Marcus organized imaging data in XNAT using what he called “the feature stack”—a hierarchical structure with source data at the foundation, progressing upward through increasingly processed and detailed layers, culminating in biomarkers or phenotypes at the top.

Four primary sources provide the foundation: open data repositories (Cancer Imaging Archive, MIDRC, Hugging Face) that are typically free but have limited outcome information; commercial datasets from academic medical centers like Washington University in St. Louis’s CuriMeta, which are costly but well validated; institutional data from hospital systems, which are often free but require institutional access; and federated data systems that manage privacy concerns while introducing new challenges in model development.

XNAT’s Scout Data Explorer approaches data collection in a new way by indexing EHRs, picture archiving and communication systems, and pathology databases. A language vision model extracts features from both image pixels and radiology reports into a vector database, enabling natural language queries. As Marcus demonstrated, users can ask for “patients under 85 with lung cancer,” receive 768 cases, and then refine to “cases with emphysema but excluding pneumonia”—with the system using both image pixels and dictated reports to find matching patients.

Understanding data quality is crucial, but Marcus emphasized that the goal is not about finding only high-quality data. Rather, the goal is characterizing quality well enough to understand any analysis or model’s operating characteristics. XNAT deploys quality control tools like MRQy, which generates hundreds of metrics including signal-to-noise ratio and database variance. These containerized tools can run at scale, providing quality metrics on 50,000 exams quickly through user-friendly dashboards.

He then discussed data curation tasks—harmonization, labeling, annotation, sorting, and organizing—that typically consume 80 percent of graduate students’ time, leaving only 20 percent for innovation. Marcus noted that AI can be used to help with many curation tasks. His team built models that automatically align scan acquisition types to standardized labeling schemes using both DICOM metadata and image pixels, transforming weeks of curation work into minutes. Another tool classifies individual tumors by organizing datasets by tumor type and providing detailed image analysis including tumor location.

Computed data encompass radiomic features extracted from images—intensity-based features, histogram-based features, and textures. Marcus noted that this allows for easier transfer of data in the form of spreadsheets instead of massive data files. However, these features are sensitive to underlying data distribution, making proper characterization of preceding layers important.

At the feature stack’s apex are computed phenotypes—that is, biomarkers that align with actual disease attributes like molecular tumor characteristics. These represent the “end game” for many companies and laboratories. Marcus demonstrated this with glioma work, characterizing molecular biomarkers (genetic aberrations involving IDH1 and 1p/19q) status to provide prognostic and treatment information. His team built a complete proof of concept spanning from hospital system source data through quality assessment, feature extraction, and automated tumor segmentation to tumor classification—all integrated into XNAT’s automated tooling.

Marcus closed by saying that many different approaches to AI exist, but all benefit from proper data characterization. Such characterization can be supported effectively through an informatics platform like XNAT. The platform transforms the traditionally labor-intensive, error-prone process of medical imaging data management into an automated, scalable system—enabling researchers to focus on innovation rather than data management. When a patient enters the system, the entire processing stream launches automatically, ending in patient classification and predictive insights that guide clinical decision making.

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

DISCUSSION

Chiang began by asking if the concepts discussed in the session—data quality, data quantity, metadata, data context, data management, and so on—are being taught to students in universities or if we have an education gap.

Sullivan noted that an education gap may exist for both the users and the radiologists. A lack of understanding may be present among many users—specifically referring physicians—that quantitative properties can be extracted from images: “They think of images as pictures, and they do not think of them as a source of quantitative metrics in the same way that they would think about getting lab tests on tissues or specimens.”

Marcus answered that one of the biggest education issues is the gap between the people who build AI and the people who consume it. For instance, the backgrounds of graduate students in his laboratory tend to be in computer science or electrical engineering, rather than biology or biomedicine, so they often do not have a clear idea about a radiologist’s workflow or even how to look at an MRI scan.

Chung said that there is a training gap. The people who know how to make the tools may not understand what tools are needed, while the people for whom the tools are designed may not know how to use them.

Kinahan described another type of gap. Radiologists and referring physicians are comfortable with the practice of referring a patient for a scan, the radiologist doing the scan and interpreting it, and then the referring physician getting the scan and the interpretation. “It is a system that works well,” he said. However, he added, if the goal is also to use all those radiological data for other purposes, such as building AI models, “then there is a gap in the knowledge of what’s needed for that, both in terms of the data integrity or curation as well as just how to manage large volumes.”

Li asked how to prevent AI model outputs from being “garbage,” given data quality issues, and whether AI could curate poor-quality data to “launder” them for training algorithms. Chung responded that the best approach is determining what question needs to be answered and finding appropriate source data with proper metadata, noting that, for example, EHR notes are clinicians’ inferences and recollections, not raw data. Kinahan confirmed that using “helper AI” to curate large, complex datasets has become a well-accepted practice, while Sullivan emphasized that even poor-quality data can be valuable if used in proper context with known quality parameters. Chung added that when AI models underperform in real-world applications, it is often due to different data quality between training and deployment rather than model failure.

To end the discussion, Chiang asked about situations in which AI is used to create a dataset that will in turn be used to train an AI model: “How do we ensure that bias and error stop being propagated?” Kinahan replied, “That’s where quality control comes in [to make] sure that you’re not using the same AI to curate the data as you’re going to [use to] train the data with.” In general, he continued, the AI models mentioned in the discussion would be doing very different tasks.

REFERENCES

Chung, C., and D. A. Jaffray. 2021. Cancer needs a r obust “metadata supply chain” to realize the promise of artificial intelligence. Cancer Research 81(23):5810–5812.

Heaven, W. D. 2021. Hundreds of AI tools have been built to catch COVID. None of them helped. MIT Technology Review, July 30. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic.

Herzog, R., D. R. Elgort, A. E. Flanders, and P. J. Moley. 2017. Variability in diagnostic error rates of 10 MRI centers performing lumbar spine MRI examinations on the same patient within a 3-week period. Spine Journal 17(4):554–561.

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.

Keane, P. A., and E. J. Topol. 2018. With an eye to AI and autonomous diagnosis. npj Digital Medicine 1:40.

NICE (National Institute for Health and Care Excellence). 2025. Artificial intelligence–derived software to analyse chest X-rays for suspected lung cancer in primary care referrals: Early value assessment. NICE Guidance. https://www.nice.org.uk/guidance/hte12, last updated February 10, 2025.

Roberts, M., D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Ursprung, A. I. Aviles-Rivero, C. Etmann, C. Mc-Cague, L. Beer, J. R. Weil-McCall, Z. Teng, E. Gkrania-Klotsas, AIX-COVNET, J. H. F. Rudd, E. Sala, and C.-B. Schönlieb. 2021. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence 3:199–217.

Steinkamp, J., J. J. Kantrowitz, and S. Airan-Javia. 2022. Prevalence and sources of duplicate information in the electronic medical record. JAMA Network Open 5(9):e2233348.

Thrower, S. L., K. A. Al Feghali, D. Luo, I. Paddick, P. Hou, T. Briere, J. Li, M. F. McAleer, S. L. McGovern, K. D. Woodhouse, D. N. Yeboa, K. K. Brock, and C. Chung. 2021. The effect of slice thickness on contours of brain metastases for stereotactic radiosurgery. Advances in Radiation Oncology 6(4):100708.

Willemink, M. J., W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio, R. M. Summers, D. L. Rubin, and M. P. Lungren. 2020. Preparing medical imaging data for machine learning. Radiology 295(1):4–15.

Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 32
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 33
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 34
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 35
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 36
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 37
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 38
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 39
Suggested Citation: "4 Data for Artificial Intelligence Readiness." National Academies of Sciences, Engineering, and Medicine. 2025. Gilbert W. Beebe Symposium: AI and ML Applications in Radiation Therapy, Medical Diagnostics, and Radiation Occupational Health and Safety. Washington, DC: The National Academies Press. doi: 10.17226/29200.
Page 40
Next Chapter: 5 Digital Twins
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.