Revealing interactions between coding and non-coding transcripts in plants using heterogeneous networks

Joel Robertson0, Stephen Davis1, Nitin Mantri1, Alice Johnstone1
(0) RMIT
(1) RMIT University

Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 42

Abstract
Once thought to be junk genetic material, non-coding RNAs are increasingly recognised as playing an important role in transcription regulation, RNA splicing, chromatin modification and translation control.
Dysregulation of non-coding transcripts has also been associated with a number of human diseases.
In plants, however, their role is less understood.
High-throughput sequencing allows us to examine levels of co-expression between transcripts, and co-expression across a sample-set is likely to indicate involvement in similar cellular processes.
A useful technique for analysing this co-expression information is to view it as a network.
Network science is the study of complex, non-random systems represented abstractly as a set of nodes along with a set of links that signify an interaction between a node pair.
The benefit of framing expression data in this way is that it allows the use of a range of measures that focus on global and local structures in the network to infer relationships between nodes.
Community detection methods are commonly used to analyse the global coexpression network topology, grouping transcripts into modules that allow functional annotation of rare or unknown transcripts.
Less-utilised information is also observed on a local level, and graphlet counting offers a way to capture this.
Graphlets are small-scale subnetworks that can repeat many times throughout a larger network.
If a particular graphlet is significantly overrepresented in a network then it is designated as a network motif.
Similarly, a network can be characterised by its graphlet profile, allowing comparison of different networks based on this higher-order information.

In the biological context, graphlet counting has mainly been deployed on directed networks (e.g. gene regulatory networks or protein-protein interaction networks), where more graphlet types are available to characterise and differentiate networks.
This increased granularity can also be obtained in undirected co-expression networks if they are constructed as coding/non-coding heterogeneous networks.
Graphlet counting then also provides a method to examine the relationships between mRNA protein-coding transcripts and the less understood families of non-coding RNA.
Our research employs graphlet counting techniques to identify significant patterns of interaction between coding and non-coding transcripts in plants.
Raw plant sequencing data obtained from _Cicer arietinum L._ (chickpea) samples are assembled into a _de novo_ transcriptome with each transcript type determined via a filtering process to determine coding or non-coding status.
After quantification of transcript expression counts, whole-transcriptome heterogeneous co-expression networks are constructed, and a typed graphlet counting algorithm is applied to characterise the network by its higher-order structure.
Significant patterns between coding and non-coding transcripts reveal information about regulatory interactions and ultimately identify a set of candidate non-coding transcripts to be investigated experimentally.
A comparison of networks across different experimental conditions is also used to indicate which coding/non-coding interactions are particularly important at different stages of the plant’s lifecycle.
The longer-term goal of the project is to integrate graphlet counting processes with widely-used module detection workflows to facilitate a richer network analysis toolset for biologists.