Comprehensive analysis of usability and archival stability of RNA-seq tools

Karishma Chhugani0, Dhrithi Deshpande0, Serghei Mangul1
(0) University of Southern California
(1) University of California, Los Angeles

Find me on Tues Nov 24th, 1:40-3pm AEDT in Remo, table 77

Abstract
As technology has advanced, RNA-seq methods have become increasingly popular and has become an exemplar technology for transcriptome analysis, revolutionizing modern biology and clinical applications over the past decade. It has gained immense momentum driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq data analyzed by computational tools can be used to effectively tackle important biological problems such as estimating gene expression profiles across various phenotypes and conditions or detecting novel alternative splicing on specific exons. We have surveyed 235 computational tools developed from 2008 to 2020 across 15 varying domains of RNA-seq analysis. The average annual growth rate of computational tools developed for RNA-seq analysis was 114.4% from 2008 to 2014, but the rate of new tool development slowed after 2015; the average annual growth rate in tools from 2015 to 2020 was 8.97%. On an average, across the domains, there have been 18 tools developed each year between 2008-2020. Additionally, we also assessed the usability and archival stability of the computational tools designed for various types of RNA-seq analysis.Maintaining the archival stability of bioinformatics tools is increasingly important in preserving scientific transparency and reproducibility. We accessed the archival stability of the tools present in our survey and the majority of these tools are stored on archivally stable repositories (e.g, GitHub) and other tools are hosted on personal or academic webpages, which often have limited archival stability. We have also accessed the computational expertise required to install and use RNA-seq tools. A vast majority of tools require the user to operate the command line interface and only 8.09% of tools were web-based. We have also compared the availability of package managers across RNA-seq tools and majority of RNA-seq tools lack a package manager implementation. For the tools with available package manager implementation, Anaconda was the most commonly used package manager platform. The second most popular platforms were Bioconductor and CRAN. Tools that are available with package managers exhibit increased citations per year (p=1.85 x 10-7) compared with the tools that are not available as package managers (p=1.43 x 10-7). According to our survey, only 41.4% of the tools are available as package managers. Lastly, we evaluated the effect of usability on the popularity of RNA-seq tools. We found that tools that are available as package managers had significantly more citations per year compared with tools which are not available as package managers. In addition to information about usability and archival stability of the tools, we plan to create a resource which will engage the biomedical community through sharing their feedback on utilizing these tools. We hope our resources will help researchers make a more informed decision when selecting a tool for a specific type of data and research question