Using equivalence classes for differential transcript usage and variant detection in RNA-seq data

Marek Cmero0, Breon Schmidt0, Ian Majewski1, Paul Ekert2, Nadia Davidson0, Alicia Oshlack0
(0) Peter MacCallum Cancer Centre
(1) Walter Eliza Hall Institute of Medical Research
(2) Children’s Cancer Institute

Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 30

Abstract
RNA sequencing (RNA-seq) has enabled high-throughput and fine-grained quantitative analyses of the transcriptome, and has been utilised in distinct contexts such as differential expression and fusion detection. Traditionally, RNA-seq analyses have used alignment to the genome to make downstream inferences on differential expression profiles or to detect transcriptional variants such as fusions. Genome alignment, however, is computationally expensive, and can also lead to reference bias. Equivalence classes, which reflect the transcripts that a given read is compatible with, present an alignment-free alternative to isoform categorisation and quantification. Typically, equivalence classes are used as an intermediary unit to infer transcript abundance. We utilised equivalence class counts directly to perform differential transcript usage, which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. We find that equivalence class counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners.

Equivalence classes can also be combined with de novo assembly to avoid reference bias, which may obscure variant isoforms. To demonstrate this, we present MINTIE, a catch-all variant finder that detects regular and irregular fusions, transcribed structural variants and splice variants by leveraging de novo assembly, equivalence classes and differential expression. We validated MINTIE on simulated and real data sets and compared it with eight other approaches for finding novel transcriptional variants. We found MINTIE was able to detect all defined variant classes at high rates (>70%) while no other method was able to achieve this. Applying the method to real cancer and rare disease data revealed several novel variants of potential clinical significance.

We posit that equivalence classes are an efficient and flexible unit of quantification to perform diverse analyses on, such as differential transcript usage and in detection of novel structural and splice variants.