Transfer learning for data integration of single-cell RNA-seq and ATAC-seq

Yingxin Lin0, Tung-Yu Wu1, Sheng Wan2, Jean Yang3, Wing H. Wong1, Y. X. Rachel Wang3
(0) The University of Sydney
(1) Stanford University
(2) Igistec
(3) The Univesity of Sydney

Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 36

Abstract
Single-cell transcriptomics profiling with single-cell RNA-seq (scRNA-seq) has provided unprecedented resolutions in charatersing cell identities, cell functions across diverse tissues and conditions. Recent advances in measuring multiple modalities of single cells, such as single-cell ATAC sequencing (scATAC-seq), further enable characterisation of cells from different aspects. While scATAC-seq data provides the epigenomics profiling of cells, its extreme sparsity leads to its lack of the power of cell type identification. Therefore, integration of scRNA-seq and scATAC-seq allows not only cell type label transferring but also a better understanding of the cellular phenotypes.

Here, we present a new end-to-end semi-supervised transfer learning algorithm, scJoint, to integrate heterogeneous collections of scRNA-seq and scATAC-seq data. By building an integrative method with neural network based dimension reduction and semi-supervised cell type prediction model, our algorithm is able to transfer labels from scRNA-seq to scATAC-seq data and construct a joint embedding for the two modalities. We illustrate the performance of our algorithm with unpaired scRNA-seq and scATAC data collections, including unpaired large mouse cell atlas data (177,577 cells, 82 cell types) and multimodal data coupled with protein level profiles. Our algorithm outperforms the existing methods by a large margin in both joint visualisation of two modalities and cell type prediction, with accuracy rate improved by 7~14%. Using paired transcriptomic and epigenomic data as ground truth, we have further verified the label transfer performance of our algorithm.