Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement
Bruce J. Walker,Thomas Abeel,Terrance Shea,Margaret Priest,Amr Abouelliel,Sharadha Sakthikumar,Christina A. Cuomo,Qiandong Zeng,Jennifer R. Wortman,Sarah Young,Ashlee M. Earl +10 more
TLDR
Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.Abstract:
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.read more
Citations
More filters
Journal ArticleDOI
Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
Sergey Koren,Brian P. Walenz,Konstantin Berlin,Jason R. Miller,Nicholas H. Bergman,Adam M. Phillippy +5 more
TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.
Journal ArticleDOI
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.
TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low.
Journal ArticleDOI
Assembly of long, error-prone reads using repeat graphs
TL;DR: Flye as mentioned in this paper constructs an accurate repeat graph from these error-riddled disjointigs by generating arbitrary paths in an unknown repeat graph, which can then be used for genome assembly.
Posted ContentDOI
Unicycler: resolving bacterial genome assemblies from short and long sequencing reads
TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long read depth and accuracy are low.
Journal ArticleDOI
Nanopore sequencing and assembly of a human genome with ultra-long reads
Miten Jain,Sergey Koren,Karen H. Miga,Josh Quick,Arthur C Rand,Thomas A Sasani,John R. Tyson,Andrew D Beggs,Alexander T. Dilthey,Ian T. Fiddes,Sunir Malla,Hannah Marriott,Tom Nieto,Justin O'Grady,Hugh E. Olsen,Brent S. Pedersen,Arang Rhie,Hollian Richardson,Aaron R. Quinlan,Terrance P. Snutch,Louise Tee,Benedict Paten,Adam M. Phillippy,Jared T. Simpson,Jared T. Simpson,Nicholas J. Loman,Matthew Loose +26 more
TL;DR: Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
Fast gapped-read alignment with Bowtie 2
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Journal ArticleDOI
Clustal W and Clustal X version 2.0
Mark A. Larkin,Gordon Blackshields,Nigel P. Brown,R. Chenna,Paul A. McGettigan,Hamish McWilliam,Franck Valentin,Iain M. Wallace,Andreas Wilm,Rodrigo Lopez,J.D. Thompson,Toby J. Gibson,Desmond G. Higgins +12 more
TL;DR: The Clustal W and ClUSTal X multiple sequence alignment programs have been completely rewritten in C++ to facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems.