Data Deception: how data provenance failure undermines trust in AI analytics

Data Deception: how data provenance failure undermines trust in AI analytics

Dr Kelvin Ross,
Datarwe CTO

While data privacy and ethical use of data is extensively considered in relation to trust and assurance by AI communities, a recent controversy in medical research highlights that trust in data origin has deep implications for analytical insights and implications.

On 22ndMay 2020, the Lancet published a study “Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis” (1).  The article quickly became a blockbuster study at the height of the COVID-19 outbreak, brought on by a combination of medical research enchantment in potential COVID-19 cures, as well as the public/political interest following comments by President Trump in his anecdotal support for Hydroxycloroquine.  The Lancet, one of medicine’s most respected journals, lauded the study in its editorial (2):

“Despite limitations inherent to the observational nature of this study, Mehra and colleagues should be commended for providing results from a well designed and controlled study of the effects of chloroquine or hydroxychloroquine, with or without a macrolide, in a very large sample of hospitalised patients with COVID-19. Their results indicate an absence of benefit of 4-aminoquinoline-based treatments in this population and suggest that they could even be harmful.”

This observational study gained widespread acclaim, derived from data with over 96,000 patients, published in one of the world’s most prestigious medical journal, commended by the editors for its study design, debunking Hydroxychloroquine as a COVID-19 treatment, and in fact indicating it is most likely harmful.  The World Health Organisation quickly halted global Hydroxychloroquine studies over safety concerns (3).

For medical data scientists this presented a benchmark study indicating how retrospective data and real-world evidence could be used to understand effectiveness of treatments without the high cost of randomised control trials (RCT).  The study was a notable application of propensity matching on retrospective data to compare treatments, an important methodology in our AI toolkit.  Our data science team remarked on the importance of the study, although at the time we discussed on how they might have collected some of the data points, such as post-treatment arrythmias, as these are burdensome observations not routinely collected. However, we initially accepted the validity of a peer-reviewed Lancet study at face value.

Within days questions began to emerge in social media as to the validity of some of the reported data points, then the Guardian Australia broke the initial story raising serious concerns about the validity of the study (4).  Initial concerns were raised as there were glaring errors in the patient data purportedly extracted from 6 hospitals in Australia, with reported deaths exceeding the actual deaths at that time.  As concerns were investigated further, doubts emerged that little known analytics company, Surgisphere, had the capacity or capability to obtain and process the study data for 96,000 patients from many hospitals globally.  The data appeared overly complete (not having usual levels of missing data requiring imputation), and feature distributions seemed overly consistent.  Accusations of data fraud emerged.  Criticism of editorial rigour soon followed:

“The very serious concerns being raised about the validity of the papers by Mehra et al need to be recognised and actioned urgently, and ought to bring about serious reflection on whether the quality of editorial and peer review during the pandemic has been adequate. Scientific publication must above all be rigorous and honest. In an emergency, these values are needed more than ever.”

Soon after the Lancet retracted the paper (5)(6).  The paper’s authors not associated with Surgisphere retracted as independent third-party peer reviewers they appointed were not transferred data, contracts and agreements from Surgisphere in order to conduct their review.

Broader investigations followed in other studies involving Surgisphere, which led to further retraction in the New England Medical Journal on another study (7).

The Lancet, New England Journal of Medicine and others are now scrutinising their editorial processes for big data research (8):

NEJM Spokesperson: “We have limited experience with reviewing or publishing studies like this one, which used a large database based on electronic medical records. The reviewers and editors asked the authors questions about the data sources and data validity. The editors accepted the authors’ responses, rather than asking for help from reviewers with expertise in this type of data. In the future, our review process of big data research will include reviewers with such specific expertise.”

Lancet Spokesperson: “We are reviewing our requirements for data sharing and validation among authors, and data sharing following publication”

This study retraction raises serious concerns for data and AI trust. There is lesser community awareness of provenance risks, which are not considered as widely as other trust issues, such as patient privacy and ethical AI application.

Data as real-world evidence will be key in advancing medicine through AI. However there are massive reputation and commercial benefits that may incentivise data fraud.  Data is the new oil for AI, and companies making data available are achieving staggering valuations, e.g. Flatiron acquired by Roche in 2018 for US$1.9 billion (9).

Mechanisms will be required to verify the provenance of data, and this will need to be clearly aligned with data sharing for independent scrutiny. At the same time, data privacy and protection of intellectual property will need to be balanced.  Organisations undertaking data collection and curation will need to develop bonafides through transparency, certifications, and cooperation with ethics committees, editorial governance and industry regulators.

Our data science team were shocked as scrutiny layed open what now appears to fabricated data.  Our industry needs to address these shortcomings, as bad actors will be detrimental to public trust.  At the same time, we need to develop transparency without creating massive delays and burden, as increasing cost and time for clinical trials has outcome ramifications for patients pending treatment evaluation.  Never more so emphasised now in the global race to find a COVID-19 cure is the important balance between speed of medical discovery with trust in our methodologies and results.

Works Cited
  1. Mandeep R Mehra, Sapan S Desai, Frank Ruschitzka, Amit N Patel. Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. The Lancet. May 22, 2020.
  2. Christian Funck-Brentano, Joe-Elie Salem. Chloroquine or hydroxychloroquine for COVID-19: why might they be hazardous? The Lancet. May 22, 2020.
  3. The Guardian Australia. WHO halts hydroxychloroquine trial for coronavirus amid safety fears. [Online] May 26, 2020.
  4. Melissa Davey, Stephanie Kirchgaessner, Sarah Boseley. Surgisphere: governments and WHO changed Covid-19 policy based on suspect data from tiny US company. [Online]
  5. Expression of concern: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Jun 3, 2020, The Lancet.
  6. Retraction—Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Jun 5, 2020, The Lancet. DOI:
  7. Melissa Davey, Stephanie Kirchgaessner. Surgisphere: mass audit of papers linked to firm behind hydroxychloroquine Lancet study scandal. [Online] The Guardian Australia, Jun 10, 2020.
  8. Davey, Melissa. Covid-19 studies based on flawed Surgisphere data force medical journals to review processes. [Online] The Guardian Australia, Jun 12, 2020.
  9. Das, Reenita. The Flatiron Health Acquisition Is A Shot In The Arm For Roche’s Oncology Real-World Evidence Needs. [Online] Forbes, Feb 26, 2018.