Features of functional human genes

Paul Gardner0, Helena Cooper0
(0) University of Otago

Abstract
Proteins and non-coding RNAs are functional products of the genome that carry out the bulk of crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands as well as probe for specific genomic activities of multiple species and conditions. These studies have identified thousands of potential proteins, RNAs and associated activities, however there are conflicting conclusions on the functional implications depending upon the burden of evidence researchers use, leading to diverse interpretations of which regions of the genome are “functional”. Here we investigate the association between gene functionality and genomic features, by comparing established functional protein coding and non-coding genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and any genomic feature is evolutionary conservation and transcriptional activity for protein-coding and ncRNA sequences. Other strongly associated features include sequence alignment statistics, such as between-site covariation and protein substitution scores such as synonymous variation. In sum, our results demonstrate the importance of evolutionary conservation and transcriptional activity for sequence functionality, which should both be taken into consideration when differentiating between truly functional sequences and noise.