Why did Wuhan University researchers delete COVID-19 data at NIH?

It was NOT a cover-up and the data was NOT very valuable either, says Chinese official

Jul 22, 2021

Your Pekingnologist attended the press conference on COVID origin tracing Thursday morning (GMT+8) at the State Council Information Office in Beijing and ran a live-tweeting thread if you are interested.

In this brief newsletter, your Pekingnologist wants to highlight one thing, which press reports of the press conference so far - from Reuters, the Washington Post, Al Jazeera, AP, The New York Times, and Bloomberg - haven’t covered yet.

Remember this story across international media in late June - a month ago?

The New York Times: Scientist Finds Early Virus Sequences That Had Been Mysteriously Deleted

Wall Street Journal: Chinese Covid-19 Gene Data That Could Have Aided Pandemic Research Removed From NIH Database

Financial Times: US says Chinese scientists asked for removal of virus records from database

Bloomberg: U.S. Confirms Removal of Wuhan Virus Sequences From Database

Nature: Deleted coronavirus genome sequences trigger scientific intrigue

Now if all of these media outlets run a story on the same thing around the same time, there is no need to explain to you why it had been newsworthy, or Pekingnology-worthy.

What’s the story? To summarize, in the words of the Financial Times report:

Records of early Covid-19 cases in Wuhan were deleted from a US database at the request of Chinese scientists, American officials have confirmed.
A team of academics from Wuhan, where the first documented cases of Covid-19 appeared, submitted sequences of the virus that causes the disease to a US-based archive in March 2020.
Three months later, however, they asked for those sequences to be removed and the data were deleted, the US National Institutes of Health said on Wednesday, confirming the results of an investigation by biologist Jesse Bloom.
“Submitting investigators hold the rights to their data and can request withdrawal of the data,” NIH said in a statement.
The deleted information did not prove how Covid-19 first infected humans, whether via animals or a laboratory leak from the Wuhan Institute of Virology. But experts said the incident demonstrated further evidence of how Chinese researchers and officials have not been fully transparent in how they dealt with data related to the pandemic’s origins.

Basically, Jesse Bloom, a scientist in the U.S., found some SARS-CoV-2 sequencing data uploaded by some Chinese scientists were later withdrawn, then re-discovered the data on Google Cloud, then published a pre-print and the Twitter thread, leading to the news reports.

Bloom Lab @jbloom_lab

In a new study, I identify and recover a deleted set of #SARSCoV2 sequences that provide additional information about viruses from the early Wuhan outbreak: biorxiv.org/content/10.110… (1/n)

biorxiv.orgRecovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemicThe origin and early spread of SARS-CoV-2 remains shrouded in mystery. Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH’s Sequence Read Archive. I recover the deleted files from the Google Cloud, and reconstruct partial se…

Bloom Lab @jbloom_lab

There are also broader implications. First, fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples: scmp.com/news/china/soc… (16/n)

The tone of the press reports was that the withdrawal of data was suspicious at least, though some scientists were quoted to cast doubt on the significance of the deleted data, from the Wall Street Journal report

Stephen Goldstein, a University of Utah evolutionary virologist who wasn’t involved in Dr. Bloom’s research, said it was unclear if any new insights could be gleaned from the deleted sequences. “From a scientific standpoint, I don’t think they point to anything nefarious,” he said, adding that he had not made his own analysis of the sequences.

Here is some criticism from Dr. Angela Rasmussen on June 23rd

Dr. Angela Rasmussen @angie_rasmussen

@klausenhauser @jponline77 @krylormaximus Jesse gets some facts wrong about those sequences. They were not the earliest samples collected. They were collected in January, and the WHO report included sequences from December 2019. Plus he omits the fact they were published in a different paper.

Dr. Angela Rasmussen @angie_rasmussen

@klausenhauser @jponline77 @krylormaximus He has no reason to suggest that there was anything inappropriate about deleting them from SRA and he’s relying on the premise that scientists in China can’t be trusted when he asks the reader to draw their own conclusion.

What’s the Chinese side of the story?

You are welcome to buy me a coffee or pay me via Paypal.

In today’s press conference, a question was raised on this particular incident, and Zeng Yixin, vice minister of China's National Health Commission, gave a quite detailed answer:

这个事情报道出来以后，我们马上对这个事情进行了调查、了解。过程是这样的。报道里面提到的序列删除的问题，是起源于武汉大学的一些研究人员他们发表的一篇论文，在一个国际刊物《SMALL》上，论文题目是《纳米孔靶向测序用于准确和全面检测SARS-CoV-2和其他呼吸道病毒》，从这个名字可以看出来这篇文章报了一种测序方法。3月份他们投稿的时候需要测序结果，就是你建立这样一个方法，你进行了测序，你的测序结果怎么样。需要测序结果来判断测序的准确性，方法是不是可靠。所以研究者将具体的新冠肺炎病毒的测序结果上传到美国NCBI数据库，这个数据库是由NIH，也就是美国国立卫生院管理的数据库。
After this incident was reported, we immediately conducted an investigation and gained an understanding of it. Here's how it happened. The problem of sequences being deleted mentioned in the press reports originated from a paper by some researchers from Wuhan University, in the international journal Small. The title of the paper was "Nanopore Targeted Sequencing for the Accurate and Comprehensive Detection of SARS-CoV-2 and Other Respiratory Viruses."
From the title, we can see that this paper reported (centered on) a sequencing method. In March, when they submitted their paper, they needed (to show) sequencing results, that is, (after) you established such a method and conducted sequencing, how was your sequencing result? Sequencing results are needed to determine the accuracy of sequencing and whether their method is reliable.
So the researchers uploaded the sequencing results of specific COVID samples to the NCBI (National Center for Biological Information) database, which is managed by the NIH, the National Institutes of Health (of the United States).
6月9日，这个杂志向研究者发送拟出版的样稿，给他发这个样稿。这时候研究者发现，文章中原来有的描述病例样本病毒测序数据上传地址的内容在审稿过程中被删除了。所以研究者认为，没有必要再把数据存放在NCBI数据库中，研究者于去年6月16日给NIH发邮件要求撤回数据。NIH按照工作流程自行删除，你提出要删除，就自动帮你删掉了，无需通知研究者。既然是你自己提出来的，就把它删了，也没再通知研究者，研究者也把这个事儿给忽略了。所以从这个过程中看出来，这个研究者完全没有去隐瞒、掩盖的必要性，没有这个主观意图。近期，研究者已将所有61个新冠肺炎样本的244条测序相关数据上传到中国国家生物信息中心建设的GSA数据库，这个数据库是公开的，全球研究人员都可以看到，都可以查询。
On June 9, the journal SMALL sent the researcher the draft paper for publication. At this time, the researchers found that the original content (written by these Chinese researchers) in the paper describing the uploading address of the SARS-CoV-2 virus sequencing data of case samples had been deleted during the review process (of their paper).
Therefore, the researchers thought it was no longer necessary to store the data in the NCBI database. The researchers sent an email to the NIH on June 16 last year (2020) requesting that the data be withdrawn. The NIH deletes the data, according to protocols - if you proposed to delete it, then it will get deleted; and no further notice was sent to the (Chinese) researchers. The researcher then went on to forget about this. So, from this process, (you can see that) the researchers had no need to hide or cover up (anything) - they didn’t have this subjective intention.
Recently, the researchers have uploaded 244 sequencing-related data from all 61 COVID-19 samples to the GSA database constructed by China's National Center for Biotechnology Information. The database is open and can be seen and queried by researchers around the world.
根据我们了解，这批样本最早的采样时间1月30日，离疫情开始已经过去了一段时间了，其实它不是早期样本。这些序列对新冠病毒溯源研究能够提供的信息和价值都是很有限的。
According to our understanding, the earliest sampling time of this batch of samples was January 30 - some time has passed since the COVID outbreak began. In fact, it is not an early sample. These sequences provide limited information and value for COVID-19 origin tracing.
但是美国的一位研究人员Fred Hutchinson癌症中心的Jesse Bloom没有得到中国学者的确认，完全也不了解这个事情来龙去脉的背景下，就杜撰了所谓的阴谋论，说这是想掩盖的。他这种阴谋论在国际舆论界造成了很不好影响，对中方研究者进行了诬蔑，对中方研究者造成了伤害，他这种做法是背离科学的，也违反了科学伦理。后来论文出来以后也遭到了许多国家专家的批评，你这个做法不科学，违反科学伦理。在疫情流行期间，民众对于专业人员特别是科学家是高度关注，科学家的一言一行都是高度敏感的，所以每一名专家学者都应该明白我们肩上所肩负的社会责任，特别是像疫情流行期间，关于疫情相关的言论，老百姓是非常关注高度敏感，我们一定要明白我们身上的社会责任，要尽自己的努力为全社会的疫情防控做出我们专业人员的贡献，要正确地引导舆情，不要随心所欲的去猜测，造成不好的影响，这会把全社会的疫情防控带歪的。所以我觉得应该提醒每一位专家从这个事情上面接受教训……
However, a U.S. researcher, Jesse Bloom at the Fred Hutchinson Cancer Research Center, without confirmation from the Chinese researchers and having no knowledge of the ins and outs of the matter, made up a conspiracy theory saying it was a cover-up.
This conspiracy theory of his has caused a very bad influence in the international media, vilifying Chinese researchers and hurting Chinese researchers. His practice is a departure from science and a violation of scientific ethics.
Later, after the (Bloom’s) assertion came out, he was criticized by many experts from other countries, saying what he did was unscientific and violated scientific ethics.
During the pandemic, the public pays close attention to professionals, especially scientists. Scientists are put under the microscope in everything they say and do. Therefore, every expert and scholar should understand the social responsibility we shoulder. Particularly, during the pandemic, people paid close attention to opinions related to the pandemic. We must understand the social responsibility we bear. We must do our best to make a contribution as professionals to pandemic prevention and control in the whole society. We shall correctly guide the public opinion, and do not make speculations at will, which will cause bad effects and lead the pandemic prevention and control sideways. Therefore, I think we should remind every expert to learn a lesson from this matter

To sum up, even though a response to this drama could have come sooner, the reason behind the withdrawal of the data, as elaborated by Vice Minister Zeng, was a purely technical one, not nearly as sensational as what it had been made out to be.

With contribution from Yang Liu, host of the Beijing Channel newsletter.

3 Comments

Virginie Courtier-Orgogozo

Jul 31, 2021

Thank you for this text. It would be good for the Chinese researchers to be allowed to answer to Jesse Bloom's email. Researchers should be allowed to exchange ideas and knowledge with their colleagues.

Expand full comment

2 replies by Zichen Wang and others

2 more comments...