Pandas in Wuhan market? China’s COVID genetic study is out—it has problems

Giant panda cub Huanlili plays with a bamboo during her first birthday at the Beauval zoological park in Saint-Aignan, central France, on August 2, 2022.
Enlarge / Giant panda cub Huanlili plays with a bamboo during her first birthday at the Beauval zoological park in Saint-Aignan, central France, on August 2, 2022.

Chinese scientists have published their long-awaited genetic analysis of the samples and swabs they collected in early 2020 from the Huanan Seafood Market, the initial epicenter of the pandemic.

In the study, published Wednesday in Nature, the authors acknowledge for the first time that wildlife susceptible to SARS-CoV-2 infection—including raccoon dogs—were present in the market amid the plethora of genetic traces from SARS-CoV-2 and humans. But, the overall analysis is flawed, indicating the presence of animals that were almost certainly not at the market, including giant pandas, chimpanzees, and Atlantic grey seals. The authors continued to downplay the potential that a virus spillover from wildlife to humans in the crowded market was the spark that ignited the pandemic. Instead, they repeatedly put forward, without evidence, hypotheses favored by Chinese officials, namely that the virus was carried into the market via humans or frozen foods, and the bustling venue became an amplifier site for infection.

Still, the publication of the data is momentous—and a long time coming. Though the samples were collected from January 1 to March 30 of 2020, a draft of the study and some of the data were only first released in a preprint two years later, in February 2022. The preprint reported that SARS-CoV-2 was abundant amid human genetic material from the samples, indicating that the virus was prevalent among people at the market before it was shuttered on the morning of January 1. The authors, led by scientists at China's Centers for Disease Control and Prevention (China CDC), noted that they had also tested some animals in the market—mostly rabbits, stray cats, and snakes—but all were negative for SARS-CoV-2.

Withheld data

It wasn't until last month, three years after the samples were collected, that more genetic information from those samples came to light. In preparation for the publication in Nature, China CDC scientists quietly uploaded previously undisclosed metagenomic data from the samples onto a public genetic database, called GISAID, sometime in January. In early March, a group of independent international scientists noticed the data, eagerly downloaded it, and began analyzing it while reaching out to the China CDC scientists about a possible collaboration. The China CDC scientists responded by having their data pulled from public view, and GISAID publicly accused the international researchers of breaching terms of service, which they have emphatically denied.

Amid the data-access dispute, however, the international group published a preliminary analysis of the data, without publishing the underlying genetic data itself to avoid "scooping" their Chinese colleagues. Overall, that preliminary analysis showed that the environmental samples from the market were not just positive for SARS-CoV-2 and human genetic material—as the 2022 preprint suggested—but were also brimming with genetic traces of wildlife, including some known to be susceptible to SARS-CoV-2 infections, such as raccoon dogs.

The study—led by Michael Worobey, an evolutionary biologist at the University of Arizona; Kristian Andersen, a virologist at the Scripps Research Institute in California; and Florence Débarre, a theoretician who specializes in evolutionary biology at France's national research agency, CNRS—provided the first genetic evidence linking SARS-CoV-2-positive samples, humans, and susceptible wild animals together in the market.

The analysis cannot determine if the animals were infected with the pandemic virus, or, if they were, whether any animal-to-human or human-to-animal transmission occurred. Thus, it can't conclusively determine how the pandemic began. However, as many virologists and infectious disease experts have since noted, if a natural spillover event did spark the pandemic, this close mingling of genetic material in a suspect market at the epicenter of early cases is exactly the type of genetic evidence scientists would expect to find after the fact. Such markets, with a menagerie of wildlife in close, crowded conditions with humans, are known to act as hotbeds of risk for viral adaption and spillovers.

In particular, Worobey and his colleagues focused on one sample from a cart—Q61 or env_0576—that was surrounded by a high-density of SARS-CoV-2 positive samples and was, itself, teeming with raccoon dog genetic material. The researchers found that the sample contained 1,252 genetic fragments with 100 percent identity to the raccoon dog genome with no such perfect matches to the human genome. The finding hints at the possibility that the SARS-CoV-2 present was from the raccoon dog, not humans.

Genetic pandamonium

But, in the newly published analysis of the same metagenomic data, the China CDC scientists identified the genetic material from that sample not as coming from raccoon dogs, but actual dogs (Canis), which are in the same genetic family, but in a different genus. Virologist Angela Rasmussen, a coauthor of the analysis led by Worobey and colleagues, noted on social media that they, too, first identified Canis in their analysis, but then realized they were using the wrong reference database of genomes, fixed it, and properly identified the raccoon dog sequences.

The authors of the Nature study seemed to acknowledge that they also used the wrong reference database in one of the two genetic approaches they applied. But they did not fix the problem before their study was published. Instead, they just noted their findings were "not definitive."

"In particular, the proportion of reads assigned as raccoon dog differ considerably with the two methods used," they wrote of their two approaches. "This may be due to the heterogeneity of the reference data used by the two methods (BOLD, as for mitochondria, and kraken2 for whole genome). It should be noted that the genera identified using current approaches might be updated with additional reference genomes. As such, this list is not definitive, and further in-depth analysis with other methods will be required to provide more information regarding the wildlife species present at the market."

But the genetic flaws don't appear to end with the misidentification of raccoon dog genetic material. According to the Nature study, the China CDC scientists also found traces of various animals that were extremely unlikely to be present in the market. That includes the highly protected, and generally large, giant panda (Ailuropoda), as well as bears (Ursus), chimpanzees (Pan), grey seals found in the North Atlantic (Halichoerus), and rodents from the Chilean Andes (Octodon). It's unclear why the Chinese scientists did not correct their analysis, but now that the metagenomic data is publicly available following the publication, outside researchers are keen to dig into it.

Regardless of what animals were present, "The possibility of potential introduction of the virus to the market through infected humans, or cold chain products, cannot be ruled out yet," the Chinese authors concluded. Still, they call for more surveillance of animals. "Surveillance of wild animals should be enhanced to explore the potential natural and intermediate hosts for SARS-CoV-2, if any, which would help to prevent future pandemics caused by animal-origin coronaviruses."