![]() |
Proceedings of a Workshop—in Brief |
Artificial intelligence (AI) and automation are increasingly being used to aid biological discovery and biotechnology development. Robotic and remotely controlled equipment is being used to accelerate research, while AI is opening new opportunities to explore the natural world and inform efforts to build biological entities with useful capabilities. Such technologies are poised to drive beneficial advances in health, biomaterials, environmental remediation, biomanufacturing, agriculture, and other areas. However, these developments also raise new questions and potential risks. Researchers, policymakers, and the public have sought to examine how applying AI and automation in biotechnology might lead to new challenges for biosecurity, health and safety, the environment, the integrity of scientific data, and economic development and national competitiveness.
To examine these issues, the National Academies of Sciences, Engineering, and Medicine hosted a workshop titled Artificial Intelligence and Automated Laboratories for Biotechnology: Leveraging Opportunities and Mitigating Risks, on April 3–4, 2024. The workshop was organized by a planning committee of the National Academies under the auspices of the Standing Committee on Advances and National Security Implications of Transdisciplinary Biotechnology. Participants from government, academia, nonprofit organizations, and private industry communities gathered virtually and in person to explore the use of AI and automation in biological research and development (R&D); discuss considerations relevant to national security; and share perspectives on potential future pathways for technology and policy development. This Proceedings of a Workshop—in Brief provides the rapporteurs’ high-level overview of the event. The views contained in the proceedings are those of individual workshop participants and do not necessarily represent the views of all workshop participants, the planning committee, or the National Academies of Sciences, Engineering, and Medicine.
In kicking off the workshop, workshop planning committee Co-Chair Deepti Tanjore (Lawrence Berkeley National Laboratory) provided an overview of the goals and structure of the workshop. The series of presentations, panel discussions, and focused breakout group discussions aimed to explore current and potential future opportunities to leverage AI and automated laboratories for biotechnology in an inclusive manner. Participants were invited to consider the relevant capabilities, potential for misuse, approaches for forecasting future capabilities, and implications of investing in—or not investing in—the development or adoption of such technologies.
Joe Buccina (National Security Commission on Emerging Biotechnology [NSCEB]) offered context on NSCEB’s motivation for sponsoring the workshop. NSCEB, a congressional commission established in 2022, is charged with examining and making recommendations to the U.S. government on the critical intersection of biotechnology and national security. To date, the commission has developed an interim report and other products to identify possible policy directions in areas relevant to biotechnology. As the NSCEB plans for its final report, Buccina said that the Commission aims to use the discussions from the workshop to inform its policy recommendations. Specifically, the NSCEB seeks to elucidate how U.S. policy could help to leverage opportunities afforded by AI and automation in biotechnology while also managing risks.
In terms of opportunities, Buccina highlighted ways these technologies could strengthen the biotechnology sector by improving the productivity of people and laboratories; enabling faster scale-up of biomanufacturing; speeding data generation to supply training sets for AI tools; and sparking public-private and international partnerships. One concern he noted is the potential for other countries to outpace the United States in developing technologies. The use of these technologies in biotechnology also raises other risks, including the potential for AI models to produce inaccurate outputs, for AI to facilitate the development of a harmful biological agent, and for increased automation to increase cybersecurity vulnerabilities, he said.
Participants shared examples of how automation and various types of AI (e.g., computer vision AI, generative AI, other forms of machine learning, and more) are utilized in biological research and biotechnology. Participants examined the unique contributions these technologies can make in facilitating basic research and advancing translational R&D in areas such as medicine, materials, and consumer products.
Nobel Laureate Frances H. Arnold (California Institute of Technology) highlighted how AI and automation can accelerate the development of new functional proteins, including with directed evolution. While scientists are now adept at reading, writing, and editing DNA, Arnold said that the field is still in the early phases of composing DNA from scratch to engineer desired biological capabilities beyond what is found in nature. Focusing on proteins—the “molecular machines” that act as the functional agents of biological processes—Arnold’s lab has developed techniques to mimic the process of evolution, making small changes to the DNA sequences that encode naturally-occurring proteins to gradually push the proteins toward new functionality.
This approach, known as directed evolution, has yielded numerous useful enzymes that have been incorporated into a variety of consumer products, materials, and research tools. However, the process remains time and resource-intensive, Arnold said, noting that predicting where a mutation should be made to improve enzyme function and the downstream screening for functional improvements remain significant challenges. Arnold outlined how AI and automation could accelerate directed evolution by allowing scientists to explore more efficiently what functions exist in nature, predict what other functions are possible, and more readily optimize new enzymes and chemistries. Ultimately, she suggested that it should be possible to use AI to model the landscape of DNA sequence and protein function and to create new proteins via a learning cycle without requiring human involvement, although she noted that humans and AI each have unique contributions and can help to fill gaps or identify opportunities that the other may miss.
Arnold described how accelerating the creation of new enzyme capabilities could open vast new opportunities to address societal needs. She highlighted how these advances could be leveraged to develop new medicines, address environmental pollutants, or produce renewable fuels and materials. Arnold described the ability to create enzymes from scratch as a grand challenge and said that a lack of sufficient data, properly formatted to build AI models, is the most significant limitation in realizing this goal.
Arnold predicted that scientific achievements in this space will outpace the ability to implement solutions at scale. For example, if scientists were to develop bio-derived enzymes capable of making materials that could replace petroleum-derived plastic, actually using that
process in practice would likely involve retooling many established industries, a significant logistical and economic challenge. She also noted that the scientific workforce may struggle to keep pace since technology advances faster than academic training pipelines.
Adrienne Hoarfrost (University of Georgia) and Andreas Andreou (Johns Hopkins University) discussed how AI and automation can accelerate basic science discoveries by increasing productivity and helping researchers focus their efforts on areas likely to be fruitful while avoiding dead ends.
Hoarfrost described how AI can speed discovery of microbial functional capabilities through enhanced predictive computation and self-accelerating experimental “closed loops”, in which experimental data outputs can be fed back into a starting AI model to improve the model over time. The microbial world displays a rich array of capabilities, she said, but understanding this diverse functionality and putting it to use remains limited. She noted that predicting functionality with computational approaches is difficult, while screening for functionality experimentally is expensive. Noting that less than 1 percent of microbes can currently be cultured, Hoarfrost said that majority of microbial functionalities remain unknown. This has hampered the ability for scientists to understand gene function; for example, Hoarfrost said that in her work on marine microbes, only about 30 percent of genes in a typical ocean sample are annotated. In addition to being incomplete, she added that the reference databases used to understand gene functionality can be biased or inaccurate.
Hoarfrost highlighted how AI and automation can help to enable the functional annotation of genes, pointing to her lab’s use of an AI tool to predict DNA sequence and protein properties, including function. Using AI to predict functionality can unleash new knowledge by breaking scientists’ reliance on the relatively small amount of genomic annotation data that is available, she said. She added that AI and automation can help scientists access more data by accelerating experimental screening with high throughput, repeatable experiments driven by a self-accelerating closed loop.
Building on the concept of AI-enabled closed loops, Andreas Andreou (Johns Hopkins University) discussed how AI has closed the loop from sensing to knowledge to behaving to environment. For example, he said an approach termed neuromorphic cognitive computing can help to move from sensing to knowledge via perception, while model-based systems with composability and adaptation can help to move from knowledge to behavior via reasoning. He highlighted how these concepts have been demonstrated in research tools that link semiconductor technology with the life sciences to grow cells and organoids, yielding insights on factors such as toxicity. In a development he described as AI helping AI, he also described his team’s experiment using ChatGPT to produce code for a fully connected spiking neuron array.1
Chris Gibson (Recursion) and Melissa Kemp (Georgia Institute of Technology) highlighted examples of the use of AI and automation in developing and manufacturing therapeutics.
Gibson described how automating experiments and applying algorithms can speed drug discovery and more quickly weed out dead-end drug candidates toward a goal of producing better medicines at lower cost. Recursion uses automated laboratories to perform millions of experiments per week, digitizes the petabytes of high-dimensional data generated through these experiments, and then applies algorithms to make and test predictions about potential drug candidates. Recursion’s laboratories include a wide range of model systems and generate a wide range of data types, from omics data to imaging.
By efficiently exploring a vast array of chemistry and biology through automated experiments, Gibson said that the company seeks to move failure earlier in the drug development process and identify which avenues are likely to fail or succeed with the smallest possible number of experiments. Gibson added that Recursion is applying AI to make its rapidly proliferating collections of data and predictive tools more usable and accessible for its internal teams, helping to further increase productivity and speed discovery.
__________________
1 Tomlinson M., J. Li, and A. Andreou. 2024. Designing Silicon Brains using LLM: Leveraging ChatGPT for automated description of a spiking neuron array. arXiv:2402.10920.
Kemp described how AI and automation can enhance biomanufacturing for regenerative medicine technologies such as cell therapies, which have emerged as a potent therapeutic tool but remain expensive to develop and employ. The Center for Cell Manufacturing Technologies (CMaT), a National Science Foundation (NSF) funded collaboration of academic centers and industry members spanning the cell manufacturing value chain, seeks to propel technologies in this space, including by advancing testbeds for technology development. Cells are a dynamic and variable product, Kemp described; small variations in the source material or processing can have major impacts on yield and quality. To increase success rates and decrease the risks involved in commercialization and scale-up of cell-based technologies, CMaT focuses on improving data science, real-time analytics, and process development for various engineered systems. Through these efforts, CMaT helps to bridge academic research and commercialization by addressing questions such as what should be measured, mined, and modeled to ensure quality; how and when to measure predictive critical quality attributes during processing; and what processes and automation tools can help to manufacture cell products at scale with high quality and low cost.
D.J. Kleinbaum (Emerald Cloud Lab) and Sajith Wickramasekara (Benchling) shared examples of ways in which technology is changing the practice of science and explored implications for research productivity and access.
Kleinbaum discussed how cloud labs—physical laboratories that can be controlled from anywhere in the world via software—accelerate experimentation and broaden access to scientific tools. Cloud labs incorporate automated and robotic components along with human technicians to run experiments continuously and in parallel. Kleinbaum described how his company’s processes for vetting users and experiments helps to ensure experiments are physically feasible, safe, and do not violate biosecurity concerns.
Since experiments are performed exactly to specification, Kleinbaum said, a strength of cloud labs is that documentation is built-in and experiments can be rapidly reproduced. He added that cloud labs broaden access to science by allowing researchers to use equipment that might be too expensive or otherwise infeasible to obtain at their own institution. A network of interconnected, interoperable, and continuously-operating labs increases efficiency and productivity, ultimately driving down the costs of experimentation, he said. However, Kleinbaum noted that since each step must be fully planned ahead of time, learning to use cloud labs effectively can require a shift in mindset for researchers accustomed to “tinkering” in the laboratory to refine their experimental protocols. He identified this big shift in experimental mindset—the holistic front-to-end design and planning of an experiment at the outset coupled with an inability to “tinker” during the experiment—as a major limiting factor for the adoption of cloud labs.
Kleinbaum suggested that today’s cloud labs are about midway along the trajectory from traditional laboratories to what he described as a fully AI-orchestrated lab. Today’s cloud labs provide on-demand experimentation at highly automated facilities that are fully controlled by software but run human-designed experiments. Moving forward, Kleinbaum outlined an increasing role for AI in enabling self-optimizing experiments, reinforcement learning-based experiment simulation, and AI-managed workflows.
Wickramasekara discussed the potential for modern scientific software to speed the process of science. His company, Benchling, develops software solutions aimed at increasing efficiency within the pharmaceuticals R&D pipeline. Noting that many available research software solutions were not built for biology or AI, Wickramasekara highlighted a recent Benchling study that found 84 percent of the companies surveyed have R&D teams that rely on custom-built software.2 These ad hoc software solutions, he said, often fail to achieve data interoperability and impose a significant drain on productivity. Reinventing research practices by replacing custom applications, streamlining workflows, and unifying data capture across silos can dramatically reduce the time it takes to complete scientific steps and increase research output. To truly unleash the power of AI for biology, Wickramasekara said that modern software will be critical for enabling scale in the wet lab, for exam-
__________________
2 Benchling. 2023. State of Tech in Biopharma. https://www.benchling.com/state-of-tech.
ple, by automating repetitive tasks, and ensuring data is AI-ready for computational experiments in the dry lab. He described how this would fuel a cycle in which high quality experimental data is used to train models which then conduct high quality computational experiments and inform hypothesis testing in a continuous closed loop.
In panel and breakout discussions, participants identified additional opportunities to apply AI and automation to advance biology, along with metrics to gauge success toward these aspirations and obstacles that might be encountered.
Noting that the role of AI can be seen as augmenting but not replacing human scientists, some participants said that AI and automation offers potential for generating large scale and reproducible data that can guide scalable but targeted experiments. Several groups underscored the value of AI for enabling prediction, for example to forecast the impact of different agents or mitigations on biological processes or to predict genotype to phenotype in cells. Others highlighted the potential to tackle questions that are more complex than may be feasible to tackle without AI, such as completely modeling all chemical reactions inside a single cell. Some participants also highlighted aspirations for scaling biomanufacturing, for example through the development of new chassis organisms or using automation to enable decentralized manufacturing in remote areas. In these efforts, many participants said that technologies can be productively applied to address urgent societal challenges in areas such as climate and energy, agriculture, novel antibiotics, affordable drugs, and biodegradable materials while helping to maintain U.S. competitiveness and global leadership in biotechnology.
To enable productive use of data, some participants suggested a need for novel tools that help to maximize the information that can be extracted from biological data, understand the accuracy of models in real time, integrate and increase interoperability of multimodal datasets, and increase explainability of insights generated with AI. Kemp said that coordinating data acquisition across laboratories can make it easier to integrate and mine larger data sets, and Kleinbaum added that it is essential to be able to tie data back to the experimental processes that generated it to ensure comparisons are appropriate. Several participants also highlighted the importance of developing technology with the end user in mind and building technical literacy among biologists. Others suggested that it will also be important to help regulators keep pace with developers and explore novel legal approaches for promoting the sharing of data and tools.
Participants discussed the investment environment that influences which technologies attract funding, including considerations related to projecting costs, benefits, return on investment, and impacts in the context of commercial, national security, or other types of objectives.
Chenny Zhang, an expert in biotechnology venture capital, described the differences between private and public investors in terms of motivators and metrics for success. In general, she said, private investors seek technologies with high financial value, while public investors seek technologies with high strategic value. Areas high in both will naturally receive the greatest investment, she said, and suggested that the greatest opportunity for public investment could be in companies that have high strategic value but fall slightly under the threshold of financial value necessary to receive private investment. These technologies that fall just below the financial threshold for attracting private investment are at greatest risk of becoming missed opportunities that allow other countries to outpace the United States, Zhang said. To maintain U.S. leadership, she noted that government incentives could be used to increase the potential financial value of these technologies for private investors, such as by decreasing the time to market or by catalyzing market growth, to encourage private investment that would otherwise not be made.
Andy Kilianski (ARPA-H) described the role of ARPA-H as a public funder uniquely focused on high-risk, high-reward technology in the health space, an area where AI holds significant potential. Noting that public health is a major pillar of national security—and an area where the United States is not outpacing its competitors—he
stressed the importance of supporting better health for more people. In particular, he said that AI can enable new treatment breakthroughs and clinical decision support tools to better meet the needs of patients. To help fill the gaps left by the traditional models for public investment in health, he said that ARPA-H focuses on getting insights from the lab into the hands of patients by investing in the development of technologies with high commercial potential, helping investors to understand their potential value, and forming partnerships to transition technologies into private industry within 3—4 years. This includes work to pull technologies into the health space from other areas and push health technologies into markets that do not yet exist.
Companies use competitive intelligence to consider how external developments such as evolving technological capabilities, regulatory frameworks, and the activities of competing businesses might affect their future business trajectory. Describing her role in helping pharmaceutical companies use external data to inform decision making, Jolene Lau (BioMarin Pharmaceutical Inc.) discussed how competitive intelligence draws upon a spectrum of components ranging from those that are easiest to automate to those that are more subjective. Facets of this process that are most easily automated are those related to information gathering and synthesis; Lau noted that both AI and non-AI based tools are available to mine public or semi-public data sources and package the information to help companies gauge what their competitors are doing. A facet that may require more human analysis or guidance but can be assisted by AI tools is determining what the impact of those competitor activities might be, for example, by considering the context of the patient populations that might be affected or the feasibility of conducting clinical trials. Determining how a company responds to external developments is the most subjective step in the process, Lau said, and the step that is least amenable to automation. In setting strategy, she emphasized that businesses must define who they are as a company, where they want to be in the future, and how they will measure success, decisions that are based in values and subjective judgments.
Panelists discussed how different stakeholders think about impact when assessing and investing in technologies. Zhang and Kilianski said that platform technologies can enable advances across the entire biotechnological ecosystem and as such are often seen as having the greatest potential impact. However, Zhang noted these platform technologies sometimes do not attract as much private investment as compared to product technologies because they might not have an immediate product for consumers, and therefore they could be a good target for government investments. In the context of health, Zhang, Lau, and Kilianski stressed that a key metric for success is speeding the development of therapeutics and improving patients’ lives. Kilianski said that ARPA-H prioritizes areas where there are particular unmet needs, and he and Zhang expressed the view that closing health disparities in access to care (and being careful not to exacerbate them) is also an important marker of success. Lau and Wickramasekara said that technologies that make researchers’ everyday tasks faster and easier have high potential for impact and can also reduce burnout and attrition in pharmaceutical research. Wickramasekara added that enabling scientists to ask questions they could not ask before is another important area for impact, although it is more difficult to measure.
Some participants noted that hype is an important concern as investors seek to capitalize on the benefits of AI without falling prey to unrealistic expectations, and that hype can sometimes drive unwarranted fear of a new technology. Lau suggested articulating benchmarks and grounding decisions in what has actually been achieved can help companies and investors distinguish between hype and reality. She also noted that different companies have different orientations; while some seek to lead the push into new applications of technology, others prefer to wait and see what is most likely to pay off.
Throughout the workshop, participants considered ways that emerging technologies might lead to new vulnerabilities detrimental to scientific productivity, human health and safety, national security, and U.S. competitiveness. Participants discussed approaches to understanding these potential vulnerabilities, identified actors in the R&D pipeline who may be in a position to prevent the misuse of biotechnology, and suggested potential actions they could employ.
James Diggans (Twist Bioscience) outlined the need for a shared understanding of biological risk and tools to address it. Describing current challenges in implementing biosecurity protections in the context of DNA synthesis, he said that the advent of AI-assisted novel structures could make existing challenges vastly more complex. With current technologies, implementing biosecurity best practices involves a substantial amount of human effort, and Diggans said that the lack of a shared view on how to estimate and communicate about biosecurity risks leads to inconsistencies in implementation. Like some other synthetic DNA companies, Twist Bioscience screens both its customers and the sequences they plan to synthesize, but the question of whether a particular sequence poses a dual-use or biosafety risk becomes harder to answer as the scale increases. The protein design space is enormous, Diggans said, and adversaries seeking to engineer harmful new constructs could likely evade detection by keeping their designs far removed from known examples of harmful functions. To counter this, Diggans suggested a need to move from a focus on known functional components to predicting function; however, a remaining challenge lies in determining what factors contribute to biological vulnerability and how screening efforts might be prioritized or targeted based on what functionalities are most likely to be exploited for harm.
For researchers and companies to extract value from bio-technologies, it is important for people to be able to trust the data on which those technologies are based. Positing that a lack of data integrity is the single biggest threat to the bioeconomy, Charles Fracchia (Bioeconomy Information Sharing and Analysis Center, BIO-ISAC) suggested a need for new methods for data integrity and verification. He stated that a lack of data integrity makes misuses of biotechnology, whether intentional or accidental, far more likely. For example, he said, it is important to guard against intentional poisoning of biological datasets with fake data. He suggested that techniques developed in computer science such as fingerprinting, version controls, and secure network designs could be implemented in biology to provide assurance. Fracchia also stressed that it is vital to enforce cybersecurity requirements to ensure data integrity, something that is currently implemented unevenly. He also suggested that bodies such as the National Institute of Standards and Technology could reevaluate security provisions as fast-moving technological developments move forward, the National Bioeconomy Board could assess threats at least annually, and the U.S. government can work to promote norms and standards internationally.
Outlining a goal of making biology easy to engineer but still hard to misuse, Tessa Alexanian (International Biosecurity and Biosafety Initiative for Science [IBBIS]) offered five strategies for preventing malicious actors from leveraging AI and automation to aid biological weapons development.3 First, she said that it is important to develop ways to identify in silico and automated experiments of concern. Second, she suggested monitoring for ways in which AI may lower knowledge barriers and make it easier to design and experiment with potential biological weapons. Third, she said that implementing fair and widespread customer screening and robust, universal sequence screening could help to secure the digital-to-physical transition. Here, Alexanian highlighted the difficult challenge of balancing access and security concerns, that is, the challenge of enabling access for legitimate actors and uses while preventing access to technologies for misuse. Fourth, she suggested that hardware and cybersecurity measures could help prevent the misuse of robotic laboratory equipment. Finally, she outlined opportunities to make it easier to prevent and catch misuse through secure record-keeping, proof-of-origin documentation, and screening and monitoring. She outlined specific roles for policymakers, lab leaders, model and protocol developers, hardware developers, and synthesis companies in each step.
In panel and breakout discussions, participants considered roles for government and industry in solving biotechnology security challenges while continuing to enable innovation. On the part of government, some participants suggested a need to define security concerns and to invest in early detection strategies and research that aims to make the impacts of biology more predictable. Others highlighted a role for government in establishing ontologies for biological data, and defining and regularly re-evaluating research oversight frameworks.
__________________
3 Rose, S. and C. Nelson. 2023. Understanding AI-facilitated biological weapon development. The Centre for Long-Term Resilience: London UK.
Some participants stressed the importance of facilitating dialogue between industry and government. They suggested that government and industry could work in partnership to establish biosecurity best practices and consensus standards, clarify export controls on biotechnology, and cultivate a culture that encourages monitoring and reporting concerns. On the part of industry, some participants emphasized the importance of proactively communicating with stakeholders, including reporting to government and helping to manage consumer messaging around security. They also suggested that a certification could be established to reward companies that meet or exceed security guidelines.
In a state of rapid change, how can developments in AI and automation be anticipated, what factors might drive or hinder those developments, and what new risks may emerge? Participants discussed features that encourage innovation; gaps or blockers that can affect capabilities; challenges and opportunities in driving the adoption of new products, technologies, and processes; and approaches to forecasting technological developments and associated risks.
Speakers highlighted how factors such as compute capabilities, access to funding and infrastructure, and incentives from governments and markets may influence innovation and technology adoption in the coming years.
Although the scale of major biological AI models is currently about 100-fold smaller than natural language models like GPT4, Tamay Besiroglu (Epoch AI) posited that the current growth trajectories both in compute capabilities and biological data suggest that performance of AI systems in biology is likely to continue to improve rapidly and predictably for at least the next 3—4 years. Since 2010, the amount of computational power used to train major AI systems in general has doubled about every six months,4 an acceleration compared to past decades that he said has been driven largely by the advent of deep learning. The growth has been even faster in biological AI models. Compute has doubled about every four months for major biological sequence models, with especially rapid growth in protein language models.5 Since model performance is tied to compute, Besiroglu said that performance is likely to improve with additional growth in compute power. Additionally, he said, the stock of sequence data needed to train biological AI models is large and continues to grow rapidly. Besiroglu said that there are already about 7 billion protein sequence data entries in key databases and it is estimated that DNA sequence data and metagenomic data are growing at 31 percent and 20 percent per year, respectively, providing plenty of data to support scaling for at least several years.
Theresa Good (NSF) outlined several drivers of innovation and discussed the roles she sees for human researchers and for AI in advancing biological research and biotechnology. She highlighted the NSF’s growing focus on empowering researchers to drive advances, democratizing access to the resources needed to fuel science and innovation, and facilitating translation of technological innovations to benefit society through public-private partnerships. She emphasized that access to infrastructure and tools, such as robotic equipment and compute power, is an essential driver of innovation because these factors attract creative people and empower them to make progress more quickly. This is especially true, she said, when this access is sustained and researchers can spend more time actively doing research and less time writing grant proposals. She further described the merits of geographically diversifying this access and biotechnology infrastructure in general. Pointing to data as another key enabler, Good underscored the importance of implementing FAIR data principles (findability, accessibility, interoperability, and reusability), including supporting open access for publicly funded data, and employing standards and metadata to ensure reliability and to support reuse. To drive innovation while mitigating potential biosecurity risks, she suggested a need for more computing power, more work to determine which types of measurements can feasibly be automated, more genotype-to-phenotype research to elucidate how sequences might raise security risks, and a greater focus on the possible end uses of research. While she said that
__________________
4 Epoch AI. “Parameter, Compute and Data Trends in Machine Learning.” Accessed May 13, 2024 at: https://epochai.org/data/epochdb/visualization.
5 Maug, N., A. O’Gara, and T. Besiroglu. 2024. Biological Sequence Models in the Context of the AI Directives. Accessed May 13, 2024 at: https://epochai.org/blog/biological-sequence-models-in-the-context-of-the-ai-directives.
AI can accelerate research, she posited that some experiments will still benefit from human creativity. She also added that the shift toward a greater adoption of AI and automation in biology raises new workforce needs which could lead to a skills gap.
Rob Carlson (Planetary Technologies) posited that biotechnology holds the potential to drastically alter social and economic structures and highlighted some of the risks that may be encountered if countries do not fully understand these potential impacts. He described biology as a general-purpose technology that can be stacked to produce multiple layers of value (for example, the use of bioreactors to manufacture cells which then make desired compounds). However, he said that U.S. investments in biology lag behind other fields with general purpose technologies, the field lacks a clear national strategic vision and is instead being driven primarily by private sector R&D investments, and it remains poorly measured as compared to other parts of the economy. On AI and biotechnology, he noted one problem is that many of the claims made about the potential applications of AI are not achievable anytime soon, creating hype that dilutes the conversation around AI applications and their implications. The key to achieving AI goals lies in high quality, high-precision datasets to train models, which reduces error and improves reliability, and Carlson said that increased automation will be instrumental in generating data with greater reproducibility. However, he said, automation also comes with risks. Carlson suggested that one key question from a security perspective is how distributed biomanufacturing technology might be effectively regulated or monitored without inadvertently incentivizing the insecurity that governments aim to curtail. For example, he said, previous action on DNA synthesis screening guidelines led to a new tier of providers, mostly international, that specifically catered to avoiding those guidelines.
Forecasting can help policymakers and other stakeholders anticipate risks and take steps to mitigate them. Ezra Karger (Forecasting Research Institute) and Philip E. Tetlock (University of Pennsylvania) shared insights from their research on forecasting the risks associated with AI. In a series of exercises, researchers recruited groups of random participants and groups of experts to forecast risks, identify key questions whose answers would influence their forecasts, identify policy ideas that could potentially mitigate risks, and forecast how effective those policies might be. The early phases of this research yielded insights into the best ways to ask questions about future risk and what features make someone good at forecasting. When talented generalist (non-expert) forecasters were compared with expert forecasters, the non-experts, at large, were less concerned about the potential for AI to cause catastrophic impacts during this century than the experts, although there was large variability in each group and there was no consensus regarding the level or types of risks expected.6,7 The researchers found that despite monthslong efforts to facilitate meaningful dialogue, few participants demonstrated a willingness to understand the other side’s views or to change their own views in light of someone else’s arguments.
Reliable data—in enormous quantities—is an important component in developing useful AI models. Erika Alden DeBenedictis (Francis Crick Institute) founded the nonprofit organization Align to Innovate to improve biological datasets available for training by building partnerships and experimenting with new ways to incorporate automation in data collection. Pointing to DeepMind’s AlphaFold 2 as one of the most successful AI systems in biology, DeBenedictis said that large, high-fidelity datasets will be critical to enabling similarly successful models in the future. AlphaFold was made possible by the Protein Data Bank, which represents about $10 billion worth of data. Other ingredients for success include a well-defined problem and a match between the structure of the problem and machine learning capabilities. To enable more predictive capabilities in biology, DeBenedictis said that it is important for the scientific community to consider what datasets are most feasible to obtain, and then align efforts to reproduce, scale, and share data. For example, she suggested that the community could focus on data to support models that predict how well
__________________
6 Karger, E., J. Rosenberg, Z. Jacobs, M. Hickman, R. Hadshar, K. Gamin, T. Smith, B. Williams, T. McCaslin, and P.E. Tetlock. 2023. “Forecasting Existential Risks Evidence from a Long-Run Forecasting Tournament.” Forecasting Research Institute Working Paper.
7 Rosenberg, J., E. Karger, A. Morris, M. Hickman, R. Hadshar, Z. Jacobs, and P.E. Tetlock. 2024. “Roots of Disagreement on AI Risk.” Forecasting Research Institute Working Paper.
a protein will express in a particular microbe (a capability that would be valuable for both research and biomanufacturing) or predict the effect of point mutations on protein functions. Since predictive models are biased based on their training sets, she emphasized that the strength of such models and their applicability to various scientific goals will depend heavily on the data used to train them.
James Wang (Creative Ventures) highlighted the critical role of data in determining which technologies or companies attract investment and which do not. AI-based biology applications have a very different business model than the largely software-oriented companies that venture capitalists have focused on over the past 10—15 years, Wang said. AI requires spending large amounts of money to acquire data and train models before a company can begin using those models to serve customers; in addition, it takes additional money to perform inference with the models in order to serve customers. Added to the fact that biology has high upfront and marginal investment costs in general, especially as compared to software companies, companies in this space often require large investments to get off the ground. In selecting which companies are most likely to succeed, Wang said that investors are likely to favor platform technologies that can enable innovation across multiple fronts. He also stressed that a company’s pricing power stems from the proprietary data it has. Rather than compute power or the AI models themselves, he said that the differentiator for startups is the type and quantity of data they can use to train their models. The harder data is to collect, the more value can be extracted from it. This can pose a challenge for companies in the context of public data sets or moves toward greater data sharing, and Wang said that it is important to consider how data access and ownership interact with the context of private investment and the broader context of economic growth at the national level.
With data being so central to the value of AI-based bio-technologies, some participants noted that there remain many unanswered questions about the implications of data sharing or the optimal way to approach it. Kilianski said that ARPA-H aims to make data and tools as openly accessible as possible, but acknowledged that it is a challenge to make the outcomes of government investments publicly available while also giving companies the opportunity to monetize on data products. He said that a main goal is to help accelerate the entire biotechnology field, and not only particular commercial entities. Carlson and Zhang added that open sharing of data generated with public support can allow other countries to reap the benefits of U.S. investments, in addition to supporting the use of that data by U.S. researchers and companies.
Suggesting that questions around data sharing cannot be left to the scientific community to solve alone, Andreou said that it will be important for government to play a role in setting policies about data and metadata access. Several participants posited that these questions will only grow more important and urgent as cloud labs become more sophisticated and more accessible; the metadata that can be associated with data generated in such environments makes this data even more usable and reusable, increasing its value for research and applications. Kleinbaum expressed his view that research data and methods from publicly funded research should be shared to the greatest extent possible, which is something made easier with cloud labs, but noted there is not yet a solution for a repository where all that data could be shared and accessed. Adding that patient data can be especially valuable for the pharmaceutical industry, Lau pointed out there are multiple models for collecting and sharing patient data, including biobanks that are managed by governments or nonprofits and made accessible to industry for a fee. She suggested these models could offer opportunities for expansion or improvement. Wickramasekara added that some researchers use federated approaches that allow researchers to learn from the data without actually exposing it.
Overall, Wang said that the model of providing government support for basic research while letting the commercialization of the products of that research happen in the private sphere has been a successful approach, noting that many of the companies his firm invests in are based on technologies that emerged from government-supported basic research. However, it can be tricky to strike the right balance when it comes to the government’s role in facilitating the translation of research insights into commercial products and businesses. While governments can help to create the potential for commercialization,
Wang cautioned that it is important for governments to avoid artificially picking winners and to continue investing in technologies and companies that have no inherit value, as these investments create counterproductive incentives that sustain market ecosystems but do not create innovation and growth. Good noted that NSF is experimenting with various models for facilitating commercialization, and Carlson added that one challenge in these efforts is that the U.S. has not clearly articulated what the “national interest” means in the context of biotechnology and the bioeconomy.
Building on the workshop discussions, breakout groups discussed “grand challenge” ideas on difficult but important problems for the field to address that could inform R&D approaches and investments over the next decade.
Some grand challenge ideas discussed were related to the advancement of AI and automation in science more broadly, not only in biotechnology. One idea many participants highlighted was the development of an “AI scientist” that autonomously generates scientific knowledge from start to finish (i.e., from hypothesis generation, to automated experimentation, to data analysis and reporting) and is capable of producing scientific papers that pass a human peer-review process. Others suggested ideas on facilitating free access to cloud labs and the use of physical sensors to enhance data quality and reproducibility.
Other ideas for grand challenges were related to the application of AI and automation to achieve advancements in biology and biotechnology. Some participants suggested the grand challenge of developing digital twins to simulate and predict the behavior of biological systems, including for replacing the need for animal studies. Others suggested advanced predictive models for whole genomes, protein folding, cell phenotypes, and ecosystem regeneration and resilience. Other ideas some participants discussed included a grand challenge of studying bacteria that cannot currently be cultured; sequencing and mapping the environmental microbiome across the globe; and creating fit-for-purpose enzymes from scratch. Other participants suggested ideas related to biomanufacturing, including using self-driving labs to reduce batch failure and to promote supply chain resilience.
Many grand challenge ideas discussed were related to broad societal impacts. Examples in health that were articulated by some participants included the end-to-end development of biologics and vaccines; achieving AI-enabled personalized medicine, emergency triage, and effective clinical decision-making; and the forecasting and detection of pathogens. Others discussed grand challenges related to increasing scientific literacy, supporting inclusion, broadening participation in the bioeconomy, and ensuring that the products and processes created in the pursuit of these grand challenges increase equity in access and impact.
Workshop Planning Committee Co-Chair Amina Qutub (University of Texas, San Antonio) offered closing reflections and a synthesis of the workshop’s discussions. During the workshop, Qutub described, participants examined the many ways in which AI and automation are transforming the biosciences by speeding up experiments and reinventing how R&D is performed, bridging between the digital and physical worlds, augmenting decision making, and more. The opportunities afforded by these technologies are found throughout the design-build-test cycle in supporting ideation, data organization, and project definition; predicting fruitful areas of research and eliminating research directions likely to yield negative results; data gathering and design optimization; system creation and partner integration; and quality control and data analysis. They can also create closed loops to help scientists and engineers iteratively discover, learn, build, and improve, Qutub said. Many participants noted that these approaches can contribute to solving many pressing societal challenges.
The discussions also surfaced key challenges, Qutub described, including generating the biological data needed to develop AI models while ensuring data integrity and minimizing biases; building out the digital infrastructure researchers need to effectively adopt emerging data and tools; facilitating data sharing in ways that support economic development and growth in light of the different drivers of investment in public and private spheres; and navigating the tensions and tradeoffs
between innovation and security, including the balance between open data access and concerns for misuse. Many participants also highlighted how AI and automation may impact the scientific workforce, including by changing scientific practices, which raises questions about how best to adapt educational pipelines to equip the next generation of scientists with the skills to leverage these technologies effectively and critically.
COMMITTEE MEMBERS Amina A. Qutub (Co-Chair), University of Texas, San Antonio; Deepti Tanjore (Co-Chair), Lawrence Berkeley National Laboratory; Douglas Densmore, Boston University; Jessica Dymond, IQT; Greg McKelvey, Jr., RAND; Arvind Ramanathan, Argonne National Laboratory.
STAFF Andrew Bremer, Responsible Staff Officer; Kavita Berger, Board Director; Jessica De Mouy, Research Associate; Nia Johnson, Senior Program Officer; Christl Saunders, Program Coordinator; Trisha Tucholski, Program Officer; Nam Vu, Senior Program Assistant.
DISCLAIMER This Proceedings of a Workshop—in Brief was prepared by Anne Frances Johnson, Jessica De Mouy, and Andrew Bremer as a factual summary of what occurred at the workshop. The statements made are those of the rapporteurs or individual workshop participants and do not necessarily represent the views of all workshop participants; the planning committee; or the National Academies of Sciences, Engineering, and Medicine.
REVIEWERS To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop—in Brief was reviewed by Tessa Alexanian, IBBIS; Gerald Epstein, John Hopkins University; Amina Qutub, University of Texas, San Antonio. Lauren Everett, National Academies of Sciences, Engineering, and Medicine, served as the review coordinator.
SPONSORS This workshop was supported by the U.S. Department of Defense and National Security Commission on Emerging Biotechnology.
SUGGESTED CITATION National Academies of Sciences, Engineering, and Medicine. 2024. Artificial Intelligence and Automated Laboratories for Biotechnology: Leveraging Opportunities and Mitigating Risks: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press: https://doi.org/10.17226/27469.
|
Division on Earth and Life Studies Copyright 2024 by the National Academy of Sciences. All rights reserved. |
![]() |