Long short-term memory RNN for mirtron identification
Ke Ding0, Jiayu Wen0
(0) Department of Genome Sciences, John Curtin School of Medical Research, Australian National University
Find me on Tues Nov 24th, 1:40-3pm AEDT in Remo, table 89
Abstract
MicroRNAs are small regulatory RNAs mediate extensive networks of post-transcriptional regulation and are implicated in a variety of diseases. Unlike most microRNAs generated by the canonical pathway involving Drosha and Dicer, we discovered a new subclass of microRNAs which bypass Drosha cleavage and generate functional miRNAs via splicing, termed Mirtrons. As mirtrons are lowly expressed, their detection using high-throughput sequencing is difficult. In this study, we use a recurrent neural network (RNN) known as long short-term memory (LSTM) to build a model for mirtron identification. Unlike classical machine learning methods which require pre-selected features to build the models, deep neural network models can extract relevant features without previously defining them. Comparing to other deep neural network models such as the convolutional neural network (CNN), LSTM networks are capable of capturing the long-range dependency in sequential data and are famous for solving the vanishing gradient problem in traditional RNNs when dealing with long sequence data. We showed that our model achieves a satisfactory prediction result with the area under the ROC curve (AUC) of 0.857 by only taking mirtron sequence data as input. Comparing with other models in pre-miRNA classification, we showed that our model outperforms it all by achieving F-1 score 95.3 (3.6 point absolute improvement). By training and testing our model on mammalian species, we verified mirtrons in humans and mice share similar sequence and structural features. By concatenating small RNA sequencing coverage data with mirtron sequence data as input, we further improved the performance of our model, reaching an AUC to 0.896. We also experiment with the Attention Mechanism and visualize the critical segment in sequences for identifying mirtrons. Lastly, we run our model genome-wide and predicted the novel mirtrons for future study.
Comments