This rapid expert consultation was produced by the Societal Experts Action Network (SEAN), a project of the National Academies of Sciences, Engineering, and Medicine with support from the National Science Foundation. Its aim is to enable leaders such as you to gain insight into the strengths and weaknesses of the data on the COVID-19 pandemic in your community by applying five criteria to seven types of data available to support decision making. By understanding these characteristics, you can work with the data type best-suited to the question at hand, and use the data you have to inform your decisions most effectively.
The seven data types are: the number of confirmed cases, hospitalizations, emergency department visits, reported confirmed COVID-19 deaths, excess deaths, fraction of viral tests that are positive, and representative prevalence surveys (including both viral and antibody tests). The five criteria are: representativeness; bias; uncertainty, and measurement and sampling error; time; and space. The importance of any of these five criteria depends on the nature of the decision being made, and each data type has different strengths and weaknesses.
Each data type represents a piece of the puzzle, and when used in combination, the various types form a clearer picture of how the disease is spreading and its severity. Because any single data type is likely to yield an under- or over- estimate of the extent and spread of the disease, it is important to consider multiple data types and be cautious in relying on estimates without considering sources of bias. The key implications for decision makers are summarized in Box 1 below.
Fortunately, more information about how COVID-19 is affecting the nation is now available, but as is so often the case, the information comes in various forms and is not always complete. The purpose of this rapid expert consultation is to help decision makers, especially at state and local levels, better understand and evaluate the strengths and limitations of the various data types being used as indicators of the extent and spread of COVID-19 in their communities. This enhanced understanding can lead to more informed decisions on critical issues that depend on those indicators, such as when to lift social distancing restrictions, allow public gatherings, or reopen businesses. Drawing on relevant literature and expert judgment, this rapid expert consultation describes the considerations that apply in using the available data while taking account of their limitations. It reviews in turn:
This rapid expert consultation addresses the assessment of the seven data types and the implications of those assessments for decision making; it does not recommend specific policy actions.
Specific features of the disease and response to the pandemic have implications for understanding this assessment of data types. According to the Centers for Disease Control and Prevention (CDC) (2020a), the incubation period for COVID-19 is thought to be up to 14 days, with a median time of 4–5 days from exposure to onset of symptoms and with deaths indicating infection from several weeks previously. This long incubation period and progression of infection, as well as the possibility of asymptomatic cases, has implications, discussed below, for interpreting the different data types. Also, determining both the prevalence of COVID-19 and deaths from the disease depends on the availability and accuracy of testing. In the early days of the pandemic, viral tests were rationed, and it was difficult for people to get tested. Viral tests have become more widely available, but are still available mainly to people with symptoms. Antibody tests have also become more widely available, but are of variable quality. The utility of antibody tests depends on the sensitivity and specificity of the assays, and current testing at this point could result in relatively more false-positive and fewer false-negative results.1 Some demographic groups, such as the elderly, African Americans, Latinos, and Native Americans, have been disproportionately affected by the virus, suggesting that data for these groups may deserve particular attention. Data collection should include relevant information to allow examination of such disparities, which at present is frequently missing.
___________________
1 According to the CDC (2020), evidence “suggests that the presence of antibodies may decrease a person’s infectiousness and offer some level of protection from reinfection. However, definitive data are lacking, and it remains uncertain whether individuals with antibodies (neutralizing or total) are protected against reinfection with SARS-CoV-2, and if so, what concentration of antibodies is needed to confer protection….pending additional data, the presence of antibodies cannot be equated with an individual’s immunity from SARS-CoV-2 infection.” Moreover, “the utility of tests depends on the sensitivity and specificity of the assays....In most of the country, including areas that have been heavily impacted, the prevalence of SARS-CoV-2 antibody is expected to be low, ranging from <5% to 25%, so that testing at this point might result in relatively more false-positive results and fewer false-negative results.” See https://www.cdc.gov/coronavirus/2019-ncov/lab/resources/antibody-testsguidelines.html.
The following types of data on the extent and spread of COVID-19, some of which are highly correlated with each other, are being used to inform decision making:
Given the rapid evolution of understanding of the virus that causes COVID-19, additional data types are emerging. For instance, surveillance of wastewater to detect the virus that causes COVID-19 could provide information to communities about the virus’s reemergence, and some researchers are using cell phone data to track compliance with social distancing guidelines.
The utility of data for decision making is affected by many factors, including the burden of collecting, cleaning, and interpreting the data across sources. Also, data collection and models tend to improve over time, so their assessment will also need to be updated regularly.2 Meanwhile, decision makers must use the data that are available while understanding their limitations. To this end, the following five criteria can be considered:
___________________
2 This document does not specifically review models, but the data types reviewed are typically the inputs to models. Thus, understanding the characteristics of the data inputs can inform understanding of models and similar forecasting tools related to the course of the pandemic.
Table 1 shows the seven data types listed above against the five criteria for assessing their reliability and validity. Check marks indicate that a data type generally meets a criterion, while the triangles denote the need for caution, meaning that the questions listed above under a criterion should be asked to better understand the quality of the data.
Decisions must be made in critical situations even when there is uncertainty about the best available data. It is important for decision makers to be aware of the strengths and weaknesses of the data they receive. This requires that a decision maker rely on the data available to the extent that they promote better decision making, while being mindful of the following cautions:
Table 1: Assessment of Data Types by Criteria for Reliability and Validity
| Representativeness | Bias | Uncertainty, Measurement & Sampling Error | Time | Space | |
|---|---|---|---|---|---|
| Number of confirmed cases | |||||
| Key Implication for Decision Making: This measure is readily available, but is likely to be a substantial underestimate of the prevalence of the disease in a population given that most people with COVID-19 are asymptomatic, and even among those who are symptomatic, not all are tested. As the volume of testing expands to include populations with less severe symptoms and asymptomatic individuals, this measure will be increasingly useful for determining the prevalence of COVID-19. | |||||
| Hospitalizations | |||||
| Key Implication for Decision Making: Data on hospitalizations are typically available quickly at the local level, although the completeness of reporting may vary from day to day. These data reflect only the most severe cases of infection, but changes in the number of hospitalizations likely reflect similar changes in the total number of infections within a community. Note patients requiring hospitalization were exposed several weeks previously. | |||||
| Emergency department visits | |||||
| Key Implication for Decision Making: In some jurisdictions, data on emergency department (ED) visits are available at the local level in close to real time. The reason for the visit can be reported either as a syndrome (e.g., “influenza-like illness”) or as a specific diagnosis (e.g., “COVID-19”). These data are most useful in the early stages of an outbreak or to assess resurgence, though it should be noted that patients with symptoms were exposed up to 2 weeks earlier. | |||||
| Reported deaths | |||||
| Key Implication for Decision Making: Reported COVID-19 deaths are affected by the accuracy of cause-of-death determinations and reflect the state of the outbreak several weeks previously because of the long course of COVID-19 infection. Sometimes lags in reporting of data also occur. | |||||
| Excess deaths | |||||
| Key Implication for Decision Making: Compared with the other data reviewed here, excess deaths are the best indicator of the mortality impacts of the pandemic. However, because of the possibility of death misclassification, these data represent a mix of confirmed COVID-19 deaths and deaths from other causes. | |||||
| Fraction of viral tests that are positive | |||||
| Key Implication for Decision Making: These data may not be an adequate measure of prevalence, depending on testing criteria. If mainly symptomatic people are tested, this figure is expected to overestimate the true community prevalence. The proportion is expected to decline as testing expands to include mildly symptomatic and asymptomatic people. | |||||
| Prevalence surveys (representative) | |||||
| Key Implication for Decision Making: Representative prevalence surveys are the best strategy for understanding the prevalence of a disease in any given population at a specific point in time. Such surveys can be undertaken for specific populations (e.g., workplace, nursing home, jails and prisons). Although they require undertaking a special study rather than using routinely collected data, many public health agencies have this capacity. There will be some time lag involved, however, in mounting and interpreting such a survey. | |||||
Data source usually meets this criterion.
Data source may or may not meet the criterion, and questions related to that criterion should be asked.
This section applies the five criteria described in section 2 to the seven data types commonly used to make COVID-19 policy decisions as outlined in section 1. Decision makers should use the data available to them, as they represent some of the best indicators currently available, while being explicit about their limitations and highlighting questions that should be asked of those providing the data.
___________________
3 It is of course possible that overestimation could occur, as would be the case if someone who is positive for COVID-19 has been hospitalized for a different reason (e.g., heart attack). Multiple factors contribute to a person’s state of health, and there may be some differentiation in how hospitals classify the reason for hospitalization. That said, the potential for such overestimation is less concerning than the underestimation described above in terms of assessing public health risks.
For these surveys, a representative sample of people to be tested is selected. The World Health Organization (WHO) has produced a protocol for such surveys for COVID-19 (World Health Organization, 2020). Dean (2020) outlines the advantages and challenges of such surveys, as well as ways to make the most of them. Such surveys can be conducted at the local, state, or national level. Oregon, Indiana, and Ohio have initiated such efforts. Similar surveys are often carried out for social science or market research as well as epidemiological purposes, so the methodology is well established. Several such surveys have been conducted to capture the prevalence of COVID-19, including the COVID-19 Impact Survey that is administering symptom checkers to known representative samples in 18 subnational areas (Wozniak et al., 2020; Vogel, 2020; Joseph and Branswell, 2020).
___________________
4 See also http://freerangestats.info/blog/2020/05/09/covid-population-incidence.
5 If the test is not perfect, the observed prevalence can be adjusted as follows: Adjusted prevalence = (Observed prevalence + Specificity – 1)/(Sensitivity + Specificity – 1) (Rogan and Gladen, 1978).
The COVID-19 pandemic is a reminder, once again, of the importance of evidence and a robust public health data infrastructure. Decision making related to the pandemic requires the use of data often not designed for the task at hand. With greater understanding of the strengths and limitations of these data, decision makers can make better decisions. Continued investment in public health and its data surveillance structures is needed to meet the nation’s current and future public health challenges.
SEAN is interested in your feedback. Was this rapid expert consultation useful? Send comments to
sean@nas.edu or (202) 334-3440.
___________________
7 Convenience samples are constructed from a group of people that are easy to contact or reach, and are not random.
8 For prevalence surveys, it is sometimes possible to use proxies for representative samples. For instance, in March and April 2020, seroprevalence surveys of health care workers in many major medical centers did not do a bad job of anticipating the local area prevalence. Health care workers are at higher occupational risk, but they are also more affluent than average, so those biases cancelled each other out somewhat. Another example might be a large heterogenous employer in a city that had all its employees tested; this might not be a bad proxy for a truly representative sample for that city.
Arias, E., Heron, M., and Hakes, J.K. (2016). The validity of race and Hispanic-origin reporting on death certificates in the United States: An update. Vital and Health Statistics 2(172), 1-21. Available: https://www.researchgate.net/publication/306079754_The_Validity_of_Race_and_Hispanicorigin_Reporting_on_Death_Certificates_in_the_United_States_An_Update.
Azar, K., Shen, Z., Romanelli, R., Lockhart, S., Smits, K., Robinson, S., Brown, S., and Pressman, A. (2020). Disparities in outcomes among COVID-19 patients in a large health care system in California. Health Affairs, 39(7), 1-8.
Bedford, T., Greninger, A.L., Roychoudhury, P., Starita, L.M., Famulare, M., et al. (2020). Cryptic transmission of SARS-CoV2 in Washington State. MedRxiv Preprint. Available: https://www.medrxiv.org/content/medrxiv/early/2020/04/16/2020.04.02.20051417.full.pdf.
Bellisle, M. (2020). Washington State’s actual coronavirus death toll may be higher than current tallies, health officials say. Seattle Times. May 21. Available: https://www.seattletimes.com/seattlenews/health/washington-states-actual-coronavirus-death-toll-may-be-higher-than-current-tallieshealth-officials-say.
Biemer, P., and Lyberg, L. (2008). Introduction to Survey Quality. New York: Wiley Interscience.
Borjas, G.J. (2020). Demographic determinants of testing incidence and COVID-19 infection in New York City neighborhoods. NBER Working Paper 26952. April. Available: https://www.nber.org/papers/w26952.pdf.
Centers for Disease Control and Prevention. (2020a). Interim clinical guidelines for management of patients with confirmed coronavirus disease (COVID-19). Available: https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-guidance-management-patients.html.
______. (2020b). Preliminary estimate of excess mortality during the COVID-19 outbreak—New York City, March 11–May 2, 2020. Morbidity and Mortality Weekly Report, 69(19), 603-605.
Dean, N.E. (2020). COVID-19 data dives: The takeaways from seroprevalence surveys. Medscape. May 4. Available: https://www.medscape.com/viewarticle/929861.
Henning, K.J. (2004). What is syndromic surveillance? Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report, 53 (Suppl):7-11.
Hartnett, K.P., Kite-Powell, A., DeVies, J., Coletta, M.A., Boehmer, T.K., Adjemian, J., Gundlapalli, A.V. (2020). Impact of the COVID-19 pandemic on emergency department visits—United States, January 1, 2019–May 30, 2020. Morbidity and Mortality Weekly Report, June 3, 2020, early release, 69.
Johndrow, J., Lum, K., Gargiulo, M. and Ball, P. (2020). Estimating the number of SARS-CoV-2 infections and the impact of social distancing in the United States. arXiv Preprint. Available: https://arxiv.org/pdf/2004.02605v2.pdf.
Joseph, A., and Branswell, H. (2020). The results of coronavirus ‘serosurveys’ are starting to be released. Here’s how to kick their tires. STAT, April 24. Available: https://www.statnews.com/2020/04/24/the-results-of-coronavirus-serosurveys-are-starting-to-bereleased-heres-how-to-kick-their-tires.
Rogan, W.J., and Gladen, B. (1978). Estimating prevalence from the results of a screening test. American Journal of Epidemiology, 107(1), 71-76. Available: https://pubmed.ncbi.nlm.nih.gov/623091.
Vogel, G. (2020). Antibody surveys suggesting vast undercount of coronavirus infections may be unreliable. Science, April 21. Available: https://www.sciencemag.org/news/2020/04/antibodysurveys-suggesting-vast-undercount-coronavirus-infections-may-be-unreliable.
World Health Organization. (2020). Population-based age-stratified seroepidemiological investigation protocol for COVID-19 virus infection. Available: https://apps.who.int/iris/handle/10665/331656.
Wozniak, A., Willey, J., Benz, J., and Hart, N. (2020) COVID Impact Survey. Chicago, IL: National Opinion Research Center.
Special thanks go to our colleagues on the SEAN executive committee, who dedicated time and thought to this project: Dominique Brossard, University of Wisconsin, Madison; Michael Hout, New York University; Arati Prabhakar, Actuate; and Jennifer Richeson, Yale University.
We extend gratitude to the staff of the National Academies of Sciences, Engineering, and Medicine, in particular to Emily P. Backes, who contributed research, editing, and writing assistance. Thanks are also due to Mike Stebbins (Science Advisors, LLC and Federation of American Scientists) and Kerry Duggan (SustainabiliD, LLC and Federation of American Scientists), consultants to SEAN, who provided additional editorial and writing assistance. We also thank Rona Briere for her skillful editing.
To supplement their own expertise, the authors received input from several external sources, whose willingness to share their perspectives and expertise was essential to this work. We thank Oxiris Barbot, New York City Department Health and Mental Hygiene; Paul Biemer, RTI International and University of North Carolina, Chapel Hill; Ron Carlee, Old Dominion University; Jeffrey Eaton, Imperial College London; Thomas Farley, Philadelphia Department of Public Health; William Hanage, Harvard T.H. Chan School of Public Health; Stéphane Helleringer, The Johns Hopkins University; Claude-Alix Jacob, Cambridge Public Health Department; Nancy Krieger, Harvard T.H. Chan School of Public Health; Roger J. Lewis, Harbor-UCLA Medical Center; Linda Langston, Langston Strategies Group; Roderick Little, University of Michigan; Christopher J. L. Murray, University of Washington; Annise Parker, Victory Fund and Victory Institute; and John Shirey, City of Sacramento (retired).
We also thank the following individuals for their review of this rapid expert consultation: Georges C. Benjamin, American Public Health Association; Nicholas A. Christakis, Yale University; Ana Diez-Roux, Drexel University; David Dowdy, Johns Hopkins University; Adriana Lleras-Muney, University of California, Los Angeles; Abigail Wozniak, Federal Reserve Bank of Minneapolis; Emilio Zagheni, Max Planck Institute for Demographic Research.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions of this document, nor did they see the final draft before its release. The review of this document was overseen by Susan J. Curry, The University of Iowa and Alicia L. Carriquiry, Iowa State University. They were responsible for making certain that an independent examination of this rapid expert consultation was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authors and the National Academies.
MARY T. BASSETT, (Co-chair), Harvard University
ROBERT M. GROVES, (Co-chair), Georgetown University
DOMINIQUE BROSSARD, University of Wisconsin, Madison
JANET CURRIE, Princeton, University
MICHAEL HOUT, New York University
ARATI PRABHAKAR, Actuate
ADRIAN E. RAFTERY, University of Washington
JENNIFER RICHESON, Yale University
Staff:
MONICA N. FEIT, Deputy Executive Director DBASSE
ADRIENNE STITH BUTLER, Associate Board Director
EMILY P. BACKES, Senior Program Officer
DARA SHEFSKA, Associate Program Officer
PAMELLA ATAYI, Program Coordinator