Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega
Fabian Sievers,Andreas Wilm,David Dineen,Toby J. Gibson,Kevin Karplus,Weizhong Li,Rodrigo Lopez,Hamish McWilliam,Michael Remmert,Johannes Söding,Julie D. Thompson,Desmond G. Higgins +11 more
TLDR
A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.Abstract:
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.read more
Citations
More filters
Journal ArticleDOI
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Kazutaka Katoh,Daron M. Standley +1 more
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Journal ArticleDOI
Deciphering key features in protein structures with the new ENDscript server
Xavier Robert,Patrice Gouet +1 more
TL;DR: This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization of ENDscript 2 and ESPript 3 to handle a large number of data with reduced computation time.
Journal ArticleDOI
MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization
TL;DR: The Web interface for recently developed options for large data and interactive usage to refine sequence data sets and MSAs for multiple sequence alignment are explained.
Journal ArticleDOI
Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus.
TL;DR: These analyses provide insights into the receptor usage, cell entry, host cell infectivity and animal origin of 2019-nCoV and may help epidemic surveillance and preventive measures against 2019- nCoV.
Journal ArticleDOI
A SARS-CoV-2 protein interaction map reveals targets for drug repurposing.
David E. Gordon,Gwendolyn M. Jang,Mehdi Bouhaddou,Jiewei Xu,Kirsten Obernier,Kris M. White,Matthew J. O’Meara,Veronica V. Rezelj,Jeffrey Z. Guo,Danielle L. Swaney,Tia A. Tummino,Ruth Hüttenhain,Robyn M. Kaake,Alicia L. Richards,Beril Tutuncuoglu,Helene Foussard,Jyoti Batra,Kelsey M. Haas,Maya Modak,Minkyu Kim,Paige Haas,Benjamin J. Polacco,Hannes Braberg,Jacqueline M. Fabius,Manon Eckhardt,Margaret Soucheray,Melanie J. Bennett,Merve Cakir,Michael McGregor,Qiongyu Li,Bjoern Meyer,Ferdinand Roesch,Thomas Vallet,Alice Mac Kain,Lisa Miorin,Elena Moreno,Zun Zar Chi Naing,Yuan Zhou,Shiming Peng,Ying Shi,Ziyang Zhang,Wenqi Shen,Ilsa T Kirby,James E. Melnyk,John S. Chorba,Kevin Lou,Shizhong Dai,Inigo Barrio-Hernandez,Danish Memon,Claudia Hernandez-Armenta,Jiankun Lyu,Christopher J.P. Mathy,Tina Perica,Kala Bharath Pilla,Sai J. Ganesan,Daniel J. Saltzberg,Rakesh Ramachandran,Xi Liu,Sara Brin Rosenthal,Lorenzo Calviello,Srivats Venkataramanan,Jose Liboy-Lugo,Yizhu Lin,Xi Ping Huang,Yongfeng Liu,Stephanie A. Wankowicz,Markus Bohn,Maliheh Safari,Fatima S. Ugur,Cassandra Koh,Nastaran Sadat Savar,Quang Dinh Tran,Djoshkun Shengjuler,Sabrina J. Fletcher,Michael C. O’Neal,Yiming Cai,Jason C.J. Chang,David J. Broadhurst,Saker Klippsten,Phillip P. Sharp,Nicole A. Wenzell,Duygu Kuzuoğlu-Öztürk,Hao-Yuan Wang,Raphael Trenker,Janet M. Young,Devin A. Cavero,Devin A. Cavero,Joseph Hiatt,Joseph Hiatt,Theodore L. Roth,Ujjwal Rathore,Ujjwal Rathore,Advait Subramanian,Julia Noack,Mathieu Hubert,Robert M. Stroud,Alan D. Frankel,Oren S. Rosenberg,Kliment A. Verba,David A. Agard,Melanie Ott,Michael Emerman,Natalia Jura,Mark von Zastrow,Eric Verdin,Eric Verdin,Alan Ashworth,Olivier Schwartz,Christophe d'Enfert,Shaeri Mukherjee,Matthew P. Jacobson,Harmit S. Malik,Danica Galonić Fujimori,Trey Ideker,Charles S. Craik,Stephen N. Floor,James S. Fraser,John D. Gross,Andrej Sali,Bryan L. Roth,Davide Ruggero,Jack Taunton,Tanja Kortemme,Pedro Beltrao,Marco Vignuzzi,Adolfo García-Sastre,Kevan M. Shokat,Brian K. Shoichet,Nevan J. Krogan +128 more
TL;DR: A human–SARS-CoV-2 protein interaction map highlights cellular processes that are hijacked by the virus and that can be targeted by existing drugs, including inhibitors of mRNA translation and predicted regulators of the sigma receptors.
References
More filters
Journal ArticleDOI
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Journal ArticleDOI
Clustal W and Clustal X version 2.0
Mark A. Larkin,Gordon Blackshields,Nigel P. Brown,R. Chenna,Paul A. McGettigan,Hamish McWilliam,Franck Valentin,Iain M. Wallace,Andreas Wilm,Rodrigo Lopez,J.D. Thompson,Toby J. Gibson,Desmond G. Higgins +12 more
TL;DR: The Clustal W and ClUSTal X multiple sequence alignment programs have been completely rewritten in C++ to facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems.
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.
Journal ArticleDOI
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.