Abstract
Genomic data provides useful information for public health practice, particularly when combined with epidemiologic data. However, sampling bias is a concern because inferences from nonrandom data can be misleading. In March 2021, the Washington State Department of Health, USA, partnered with submitting and sequencing laboratories to establish sentinel surveillance for SARS-CoV-2 genomic data. We analyzed available genomic and epidemiologic data during presentinel and sentinel periods to assess representativeness and timeliness of availability. Genomic data during the presentinel period was largely unrepresentative of all COVID-19 cases. Data available during the sentinel period improved representativeness for age, death from COVID-19, outbreak association, long-term care facility-affiliated status, and geographic coverage; timeliness of data availability and captured viral diversity also improved. Hospitalized cases were underrepresented, indicating a need to increase inpatient sampling. Our analysis emphasizes the need to understand and quantify sampling bias in phylogenetic studies and continue evaluation and improvement of public health surveillance systems.