A comprehensive analysis of code and data availability in biomedical research

Dhrithi Deshpande0, Ruiwei Guo0, Serghei Mangul1
(0) University of Southern California, Los Angeles
(1) University of Southern California

Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 74

Abstract
In biomedical research, it is not only imperative to publish a detailed description of the study design, methodology, results and interpretation, but, there is a pressing need to make all the biomedical data and code used for scientific analyses sharable, well documented and reproducible. Analytical code and data availability is consequential for ensuring scientific transparency and reproducibility. However, raw data is not sufficient to make scientific analyses reproducible. We have reviewed the code and data availability in 11 different prominent biomedical journals published between 2016-2020 and our current results indicate that while the majority of articles comply with the data sharing policies of journals, most of them are not accompanied with code. 98.5% of the research papers have data availability whereas only a meagre 40.9% of the research papers have code available. A majority, 59.1% of the studies do not share their code. Code sharing can warrant for reproducibility of the scientific analyses and transparency. For those research papers which do share code, we further intend to verify whether the code is usable and reproducible. We also plan to extend our survey to corroborate if every figure in the article is backed up by code and attempt to run the code to evaluate its reproducibility and the language used for data analysis. We hope our results will abet the researchers and journals in adoption of best practices to ensure scientific transparency and reproducibility.