A comprehensive benchmarking of WGS-based structural variant callers

Varuni Sarwal0, Sebastian Niehus1, Ram Ayyala0, Eleazar Eskin0, Jonathan Flint0, Serghei Mangul2
(0) University of California Los Angeles
(1) Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany
(2) University of Southern California

Find me on Tues Nov 24th, 1:40-3pm AEDT in Remo, table 103

Abstract
Structural variants (SVs) are genomic regions that contain an altered DNA sequence due to deletion, duplication, insertion, or inversion, and have varying pathogenicity of disease. Dissecting SVs from whole genome sequencing (WGS) data presents a number of challenges and a plethora of SV-detection methods have been developed. Currently, there is a paucity of evidence which investigators can use to select appropriate SV-detection tools. We evaluated the performance of 15 SV-detection tools based on their ability to detect deletions from aligned WGS reads using a comprehensive PCR-confirmed gold standard set of SVs to find methods with a good balance between sensitivity and precision. While the number of true deletions is 3710, the number of deletions detected by the tools ranged from 899 to 82,225. 53% of the methods reported fewer deletions than are known to be present in the sample. The length distribution of detected deletions varied across tools and was substantially different from the distribution of true deletions. 53% of tools underestimate the true size of SVs and deletions detected by BreakDancer were the closest to the true median deletion length. We allowed deviation in the coordinates of the detected deletions and compared deviations to the coordinates of the true deletions from 0 to 10,000 bp. Manta achieved the highest f-score for all thresholds. Methods with high specificity rates tend to also have significantly higher f-score and precision rates. CLEVER was able to achieve the highest sensitivity while the most precise method was PopDel. We assessed the performance of SV callers at coverages from 32x to 0.1x generated by down-sampling the original WGS data. DELLY showed the highest F-score for coverage below 4x while Manta was the best performing tool from 8x to 32x. We assessed the effect of deletion length on the accuracy of detection. Manta and CREST were the only tools with high specificity for deletions shorter than 500bp. LUMPY was the only method able to deliver an F-score above 30% across all categories. Manta and LUMPY were the best performing tools for general applications. Our recommendations can help researchers choose the best SV detection software, as well as inform the developer community of the challenges of SV detection.