Metagenomic Geolocation with Read Signature Clustering
Timothy Chappell0, Shlomo Geva0, James Hogan0, David Lovell1, Dimitri Perrin0, Andrew Trotman2
(0) Queensland University of Technology
(1) Queensland Universit of Technology
(2) University of Otago
Find me on Wed Nov 25th, 1:30-2:50pm AEDT in Remo, table 122
Abstract
Metagenomic sequencing produces large quantities of reads to characterise environmental samples. These reads can be binned and assembled, or simply fed into a fast sequence classification tool such as Kraken, but only at enormous computational expense. We present a novel approach that takes an entire metagenomic sample and reduces it to a small number of vectors that characterise the underlying sample effectively enough that they can be used to predict its geographic origin.
In this presentation we will introduce an approach wherein we compute read signatures using random orthonormal k-mer vectors and cluster them down to a small number of centroids that retain much of the expressivity of the original samples. We will then show, using ground truth from the CAMDA Metagenomic Location Challenge, that we are able to use the resulting read signatures in conjunction with a nearest-neighbour classifier to predict the geographic locations of many of the metagenomic samples.
Comments