ampir: an R package for fast genome-wide prediction of antimicrobial peptides

Legana Fingerhut0, Ira Cooke0, Jan Strugnell0, David Miller0, Norelle Daly0
(0) James Cook University

Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 24

Abstract
Antimicrobial peptides are natural antibiotics, part of the innate immune system, which help defend the host against pathogens and regulate the microbiome. Antimicrobial peptides occur in all life, are incredibly diverse, mostly quite small (< 200 amino acids), and typically only comprise a small proportion in a genome (~ 1%). This makes them very difficult to find. One way to discover antimicrobial peptides is by using statistical learning methods, but so far most attempts to do this have focussed on a subset of sequences that mostly include mature peptide sequences. This has limited utility in novel antimicrobial peptide discovery because gene predictions usually only provide a predicted precursor protein sequence within which the much shorter mature peptides is rarely known. We created a classification model (support vector machine with radial kernel) specifically trained for genome-wide scanning. The model was implemented in an R package, ampir. ampir was designed for high throughput and supports parallelisation. ampir was tested on multiple test sets (including complete proteomes) and performed with high precision. ampir can be used to narrow down the search space for novel antimicrobial peptides in genomes.