On October 10–11, 2023, the National Academies of Sciences, Engineering, and Medicine (the National Academies) hosted the U.S. Research Data Summit at the National Academy of Sciences Building in Washington, DC. The summit was undertaken by a planning committee organized under the U.S. National Committee for CODATA.
The summit was informed by input from 29 organizations, including leaders from federal government agencies, the private sector, public and nonprofit organizations, and research institutions. A survey undertaken by the Association of Research Libraries in advance of the summit focused on gauging the demand for cross-sector research data collaboration and sustained communication among U.S. organizations.1
The response to the survey was overwhelmingly positive. Following the survey, the summit planning committee conducted a series of focus groups with 50 leaders from various sectors, whose missions are either centered on or fundamentally dependent on research data policies and practices. The insights from these focus groups, held between May and July 2023, helped identify seven themes for potential national-level collaboration: (1) artificial intelligence; (2) decarbonization; (3) treatment of Indigenous and minority data; (4) data quality evaluation; (5) training; (6) standards for data management; and (7) principles related to justice, equity, diversity, and
___________________
1 Kennedy, M. L., and C. Hudson Vitale. 2022. Identifying Collaboration Priorities for US-Based Research Data Organizations: Questionnaire Results. Washington, DC: Association of Research Libraries. https://doi.org/10.29242/report.researchdataorgscollab2022.
inclusion. An additional task emerged during the October 2023 summit: engaging the next generation of data leaders.
The summit was designed to inspire participants and others to advance these initiatives. Targeted at organizational leaders with the authority to pursue collaborations, the summit began by sharing lessons learned from existing collaborative research data practices, with contributions from the National Science Foundation, the Foundation for Energy Security and Innovation, LinkedIn, Microsoft, the GovLab, the Historically Black Colleges and Universities (HBCU) Data Science Consortium, the Collaboratory for Indigenous Data Governance, and the Institute of Electrical and Electronics Engineers (IEEE). Participants were also guided by existing frameworks as they discussed and refined a draft set of guiding principles for future collaborations.
The summit was intended to be a first step along that path (see statement of task in Box 1-1).
This Proceedings of a Workshop was prepared by the workshop rapporteur as a factual summary of what was presented and discussed at the summit. The planning committee’s role was limited to planning and convening the summit. Any opinions, conclusions, or recommendations are those of the participants and do not represent a position of the National Academies. The summit was carried out under the Chatham House Rule, which ensures that participants and their affiliations remain confidential unless explicitly permitted by the speaker.
In her introductory remarks, Marcia McNutt, president of the National Academy of Sciences, provided context for the meeting and explained the timing of its convening. In the scientific community, she said, data-informed progress is more essential than ever. The U.S. federal government has recognized this, as indicated by the 2022 release of the so-called Nelson Memo from the White House’s Office of Science and Technology Policy. That memo instructed federal agencies to modify public access policies to ensure that any publications and supporting data resulting from federally funded research is made publicly available “without an embargo on their free and public release” (White House, 2022). Meanwhile, private foundations and industry are working to unlock the power of data to accelerate progress, most notably in areas related to artificial intelligence. It was within this context that the research data summit took place.
A planning committee of the National Academies of Sciences, Engineering, and Medicine will plan and convene a U.S. Research Data Summit of national organizations across all sectors who are shaping and influencing U.S. research data policies and practices. The purpose of the summit is to discuss strategies to increase coherence of interests and activities among the various organizations in order to increase communication and collaboration and reduce unnecessary duplication of effort within the United States as well as position the United States to be well represented in international discussions on research data.
Invitees will include representatives of U.S. federal agencies’ data management programs, governmental groups focusing on data (such as Networking and Information Technology Research and Development (NITRD), the White House Office of Science and Technology Policy’s National Science and Technology Council Subcommittee on Open Science (OSTP/SOS), data projects with many stakeholders (such as the National Science Foundation’s Big Data Regional Innovation Hubs), disciplinary and data science professional societies, U.S. bodies of international data organizations (such as Research Data Alliance – United States [RDA-US]), U.S. National Committee for CODATA [USNC/CODATA], GO FAIR), and U.S.-based project funders.
In an environment where research data activities are increasingly being undertaken at national and global scales, McNutt said, the summit could serve as a catalyst for promoting cooperation and collaboration among U.S. research data organizations. This, she noted, would allow the United States to more effectively advance national interests and also benefit from international research data initiatives. “One goal of the summit,” she said, “is to inform a vision of success based on commitments by the organizations in the room to better align U.S. research data organizations and activities over the next 3 to 5 years.”
As highlighted in various National Academies reports, such as Open Science by Design (NASEM, 2018) and Reproducibility and Replicability in Science (NASEM, 2019), McNutt continued, “the accessibility and usability of data are crucial to scientific progress and to ensuring the rigor
and reliability of reported research results.” To illustrate, she spoke about the development, during her time as editor-in-chief of Science magazine, of principles and guidelines for reporting preclinical research, with more than 30 major scientific journals agreeing on a common approach to improving transparency and reproducibility, including the sharing of data and materials. “Developing the principles and guidelines was challenging,” she said, “and the effort did not resolve all issues related to data sharing, but it was a significant and necessary step in improving reporting and publishing practices that we continue to build upon.”
Similarly, McNutt said, the research summit was designed to bring together relevant stakeholders and decision-makers to align efforts for the common good. Recognizing that different fields and disciplines have different practices and cultures related to data sharing, with some moving faster than others, she emphasized that by working across disciplines and sectors it may be possible to reduce or eliminate some of the major barriers to developing common practices and approaches to data sharing.
Following McNutt’s introductory remarks, Bonnie Carroll, chair of U.S. National Committee for CODATA, detailed the objectives of the summit. The overall aim, she said, was to identify opportunities for cross-sector research data collaborations that could achieve their goals within the next 1 to 3 years. To support this aim, the summit brought together leaders of various organizations who could make a noticeable difference through such collaborations.
Carroll informed summit participants that the focus of the summit would not be on technical challenges, as numerous groups are already addressing those issues. Instead, she said, “we’re seeking your focus on the nature of the partnerships that you need to succeed, the kinds of strategic outcomes that you can do together, and the levers that are needed in order to achieve solutions that are only possible when leading together.”
Carroll presented a slide outlining four goals for the summit:
She emphasized that the recommendations should be stated as explicitly as possible. “We would love to hear your recommendations specifically for what you need from other organizations to facilitate and sustain communication collaboration going forward,” she said, “including how to best bring the full breadth of research data and knowledge to international discussions on research data.”
Finally, she described the meeting’s goals: (1) develop guided principles for working together, (2) identify one to three priorities with identified champions from each organization and the continued commitment to work together to advance them, and (3) create and sustain a community of research data organizations and empowered individuals that will pursue effective cooperation in the future.
The summit, she explained, was the culmination of more than a year’s work spearheaded by the U.S. National Committee for CODATA and the Association of Research Libraries during which they engaged with research data organizations and academia, government, industry, and nonprofits. A major product of that work was a survey that resulted in the publication of Identifying Collaboration Priorities for U.S.-Based Research Data Organizations: Questionnaire Results (ARL, 2022). Working from the areas identified by that survey, the two organizations used focus groups to further refine them into six prioritized topics. Those topics served as a starting point for many of the summit’s discussions.
Following this introductory chapter, the proceedings has four additional chapters. Chapter 2 describes the presentations of invited speakers, who offered context on finding ways to collaborate on research data. Chapter 3 describes the efforts of the summit participants to identify guiding principles for establishing research data collaborations, culminating with five overarching principles endorsed by most summit participants. Chapter 4 summarizes outcomes of three breakout sessions, each focused on identifying opportunities for collaborative efforts in distinct areas related to research data. The participants in these sessions identified opportunities for collaborations which were then prioritized through a voting process by
the summit’s attendees. Finally, Chapter 5 describes the main output of the summit: four projects designed around the prioritized opportunities. For each project, “champions” from the summit’s attendees were identified, along with a set of stakeholders to involve, goals, and timelines for achieving these goals.
ARL (Association of Research Libraries). 2022. Identifying collaboration priorities for U.S.-based research data organizations: Questionnaire results. https://www.arl.org/wp-content/uploads/2022/11/Identifying-Collaboration-Priorities-for-US-Based-Research-Data-Organizations%E2%80%94Questionnaire-Results.pdf (accessed April 12, 2024).
NASEM (National Academies of Sciences, Engineering, and Medicine). 2018. Open science by design: Realizing a vision for 21st century research. Washington, DC: National Academies Press.
NASEM. 2019. Reproducibility and replicability in science. Washington, DC: National Academies Press.
White House. 2022. Memorandum for the heads of executive departments and agencies. Office of Science and Technology Policy, August 25. https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-access-Memo.pdf (accessed April 12, 2024).