scispace - formally typeset
Journal ArticleDOI

Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology

Kuo-Chen Chou
- 30 Nov 2009 - 
- Vol. 6, Iss: 4, pp 262-274
TLDR
The pseudo amino acid (PseAA) composition of a protein is actually a set of discrete numbers that is different from its amino acid sequence and able to harbour some sort of sequence order or pattern information.
Abstract
With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop automated methods for efficiently identifying various attributes of uncharacterized proteins. This is one of the most im- portant tasks facing us today in bioinformatics, and the information thus obtained will have important impacts on the de- velopment of proteomics and system biology. To realize that, one of the keys is to find an effective model to represent the sample of a protein. The most straightforward model in this regard is its entire amino acid sequence; however, the entire sequence model would fail to work when the query protein did not have significant homology to proteins of known char- acteristics. Thus, various non-sequential models or discrete models were proposed. The simplest discrete model is the amino acid (AA) composition. Using it to represent a protein, however, all the sequence-order information would be com- pletely lost. To cope with such a dilemma, the concept of pseudo amino acid (PseAA) composition was introduced. Its es- sence is to keep using a discrete model to represent a protein yet without completely losing its sequence-order informa- tion. Therefore, in a broad sense, the PseAA composition of a protein is actually a set of discrete numbers that is de- rived from its amino acid sequence and that is different from the classical AA composition and able to harbour some sort of sequence order or pattern information. Ever since the first PseAA composition was formulated to predict protein sub- cellular localization and membrane protein types, it has stimulated many different modes of PseAA composition for studying various kinds of problems in proteins and proteins-related systems. In this review, we shall give a brief and sys- tematic introduction of various modes of PseAA composition and their applications. Meanwhile, the challenges for find- ing the optimal PseAA composition are also briefly discussed.

read more

Citations
More filters
Journal ArticleDOI

Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization

TL;DR: A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition that has the capacity to deal with multiple-location proteins beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization.
Journal ArticleDOI

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences

TL;DR: This article proposes a much more flexible web server called Pse-in-One, which can, through its 28 different modes, generate nearly all the possible feature vectors for DNA, RNA and protein sequences, and can also generate those feature vectors with the properties defined by users themselves.
Journal ArticleDOI

Impacts of bioinformatics to medicinal chemistry.

TL;DR: This minireview is to summarize the progresses by focusing on the following six aspects: Use the pseudo amino acid composition or PseAAC to predict various attributes of protein/peptide sequences that are useful for drug development.
Journal ArticleDOI

propy: a tool to generate various modes of Chou’s PseAAC

TL;DR: A freely available, open source python package called protein in python (propy) for calculating the widely used structural and physicochemical features of proteins and peptides from amino acid sequence and can also easily compute the previous descriptors based on user-defined properties, which are automatically available from the AAindex database.
Journal ArticleDOI

iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition.

TL;DR: It was observed that the overall cross-validation success rate achieved by iSNO-PseAAC in identifying nitrosylated proteins on an independent dataset was over 90%, indicating that the new predictor is quite promising.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI

Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function

TL;DR: It is shown that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckia genetic algorithm is the most efficient, reliable, and successful of the three.
Journal ArticleDOI

UniProt: the Universal Protein knowledgebase

TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Journal ArticleDOI

Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.

TL;DR: A new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence that performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets.
Related Papers (5)