BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments
TLDR
Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences.Abstract:
The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step. Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity. BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/
.read more
Citations
More filters
Journal ArticleDOI
Asgard archaea illuminate the origin of eukaryotic cellular complexity
Katarzyna Zaremba-Niedzwiedzka,Eva F. Caceres,Jimmy H. Saw,Disa Bäckström,Lina Juzokaite,Emmelien Vancaester,Kiley W. Seitz,Karthik Anantharaman,Piotr Starnawski,Kasper Urup Kjeldsen,Matthew B. Stott,Takuro Nunoura,Jillian F. Banfield,Andreas Schramm,Brett J. Baker,Anja Spang,Thijs J. G. Ettema +16 more
TL;DR: The results expand the known repertoire of ‘eukaryote-specific’ proteins in Archaea, indicating that the archaeal host cell already contained many key components that govern eukaryotic cellular complexity.
Brief Communication FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program
TL;DR: The new 2.0 version of FastME includes Subtree Pruning and Regrafting, while remaining as fast as NJ and providing a number of facilities: Distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations.
Journal ArticleDOI
Genomic evidence for reinfection with SARS-CoV-2: a case study.
Richard L. Tillett,Richard L. Tillett,Joel Sevinsky,Paul D. Hartley,Heather Kerwin,Natalie Crawford,Andrew Gorzalski,Chris Laverdure,Subhash C. Verma,Cyprian C. Rossetto,David Jackson,Megan J Farrell,Stephanie Van Hooser,Mark Pandori +13 more
TL;DR: The findings suggest that the patient was infected by SARS-CoV-2 on two separate occasions by a genetically distinct virus, suggesting that previous exposure to Sars-Cov-2 might not guarantee total immunity in all cases.
Journal ArticleDOI
A Phylogenomic View of Ecological Specialization in the Lachnospiraceae, a Family of Digestive Tract-Associated Bacteria
Conor J. Meehan,Robert G. Beiko +1 more
TL;DR: Analysis of the genomes of 30 Lachnospiraceae isolates demonstrates that adaptation to an ecological niche and acquisition of defining functional roles within a microbiome can arise through a combination of both habitat-specific gene loss and LGT.
Journal ArticleDOI
Origins and functional evolution of Y chromosomes across mammals
Diego Cortez,Ray M. Marín,Deborah Toledo-Flores,Laure Froidevaux,Angélica Liechti,Paul D. Waters,Frank Grützner,Henrik Kaessmann +7 more
TL;DR: Although some genes evolved novel functions through spatial/temporal expression shifts, most Y genes probably endured, at least initially, because of dosage constraints, and show notable conservation of proto-sex chromosome expression patterns.
References
More filters
Journal ArticleDOI
A mathematical theory of communication
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Journal ArticleDOI
Confidence limits on phylogenies: an approach using the bootstrap.
TL;DR: The recently‐developed statistical method known as the “bootstrap” can be used to place confidence intervals on phylogenies and shows significant evidence for a group if it is defined by three or more characters.
Journal ArticleDOI
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Journal ArticleDOI
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Journal ArticleDOI
An introduction to ROC analysis
TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
Related Papers (5)
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Kazutaka Katoh,Daron M. Standley +1 more