pyTCR: a comprehensive cloud-based platform for TCR-Seq data analysis using interactive notebooks to facilitate reproducibility and rigor of immunogenomics research

Kerui Peng0, Jaqueline Brito0, Guoyun Kao0, Serghei Mangul0
(0) University of Southern California

Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 52

Abstract
T cells are crucial components of the adaptive immune system as they are activated after being exposed to antigens. During the activation, V (variable), D (diversity), J (joining) segments in the T cells receptor loci undergo VDJ recombination to create diverse repertoires for recognizing and binding to the epitopes of the antigens presented by major histocompatibility complex (MHC). With the development of high throughput sequencing, TCR-seq provides the opportunities to understand adaptive immune responses, further helps with diagnosis, prognosis prediction, treatment outcome prediction in a variety of diseases including cancer, autoimmune disease, infectious disease, and allergies.

Due to the diversity and complicity of the TCR repertoire, computational methods are needed are important in understanding the features. Existing tools have promoted the advancement in TCR analysis. However, the existing tools fail to provide easy to use interface for biomedical researchers with no or limited background. They don’t offer integrative analysis as they provide disjoined commands instead. Moreover, the analysis is not comprehensive as other tools are usually needed in order to finish the analysis. Furthermore, existing tools have limited options to customize the analysis and visualization. An alternative solution is urgently needed in this field.

pyTCR is a comprehensive platform with a rich set of functionalities of TCR repertoire analysis for biomedical researchers. Our cloud-based easy to use platform is based on the interactive notebook with the enhancement of reproducibility and transparency, by providing comprehensive and integrative functions, and customizable manipulations. The platform that pyTCR utilizes is interactive notebooks which code and results are all available to the users. pyTCR provides basic sample statistics such as number of reads, number of clonotypes, and convergence, clonality analysis, overlap analysis, segment usage analysis, diversity analysis, motif analysis. In each analysis type, metrics, visualization, and statistical analysis are provided, which offers a comprehensive solution to TCR analysis.

The existing gap between traditional biomedical research and bioinformatics provides a substantial barrier for biomedical researchers to utilize computational tools to analyze high throughput data. Our tool will illustrate the capacities of cloud-based notebooks as the solution to bridge the gap, where users with no to limited bioinformatics background or experience would be able to use notebooks to analyze the data with transparent analysis and reproducible results.