scispace - formally typeset
Search or ask a question
JournalISSN: 1748-5673

International Journal of Data Mining and Bioinformatics 

Inderscience Publishers
About: International Journal of Data Mining and Bioinformatics is an academic journal published by Inderscience Publishers. The journal publishes majorly in the area(s): Computer science & Feature selection. It has an ISSN identifier of 1748-5673. Over the lifetime, 117 publications have been published receiving 575 citations. The journal is also known as: IJDMB.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: The bat-inspired algorithm (BA) is tolerated to gene selection for cancer classification using microarray datasets and achieves comparable results of some datasets and produced new results for one dataset.
Abstract: In this paper, the bat-inspired algorithm (BA) is tolerated to gene selection for cancer classification using microarray datasets. Microarray data consists of irrelevant, redundant, and noisy genes. Gene selection problem is tackled by determining the most informative genes taken from microarray data to accurately diagnose the cancer disease. Gene selection problem is widely solved by optimisation algorithms. BA is a recent swarm-based algorithm, which imitates the echolocation system of bat individuals. It has been successfully applied to several optimisation problems. Gene selection is tackled by combining two stages, namely, filter stage, which uses Minimum Redundancy Maximum Relevancy (MRMR) method; and wrapper stage, which uses BA and SVM. To test the accuracy performance of the proposed method, ten microarray datasets were used. For comparative evaluation, the proposed method was compared with popular gene selection methods. The proposed method achieves comparable results of some datasets and produced new results for one dataset.

53 citations

Journal ArticleDOI
TL;DR: PhenoSim, a new similarity measure that includes a noise reduction component to model the noisy patient phenotype data, and a path-constrained Information Content-based method for phenotype semantics similarity measurement, could effectively improve the performance of HPO-based phenotype similarity Measurement, thus increasing the accuracy of phenotype-based causative gene prediction and disease prediction.
Abstract: It is critical yet remains to be challenging to make precise disease diagnosis from complex clinical features and highly heterogeneous genetic background. Recently, phenotype similarity has been effectively applied to model patient phenotype data. However, the existing measurements are revised based on the Gene Ontology-based term similarity models, which are not optimised for human phenotype ontologies. We propose a new similarity measure called PhenoSim. Our model includes a noise reduction component to model the noisy patient phenotype data, and a path-constrained Information Content-based method for phenotype semantics similarity measurement. Evaluation tests compared PhenoSim with four existing approaches. It showed that PhenoSim, could effectively improve the performance of HPO-based phenotype similarity measurement, thus increasing the accuracy of phenotype-based causative gene prediction and disease prediction.

45 citations

Journal ArticleDOI
TL;DR: ICA + ABC are a promising approach for solving gene selection and cancer classification problems using microarray data and the experimental results show that the proposed algorithm gives more accurate classification rate for ANN classifier.
Abstract: This paper proposed a new combination of feature selection/extraction approach for Artificial Neural Networks (ANNs) classification of high-dimensional microarray data, which uses an Independent Component Analysis (ICA) as an extraction technique and Artificial Bee Colony (ABC) as an optimisation technique. The study evaluates the performance of the proposed ICA + ABC algorithm by conducting extensive experiments on five-binary and one multi-class gene expression microarray data set and compared the proposed algorithm with ICA and ABC. The proposed method shows superior performance as it achieves the highest classification accuracy along with the lowest average number of selected genes. Furthermore, the present work compares the proposed ICA + ABC algorithm with popular filter techniques and with other similar bio-inspired algorithms with ICA. The experimental results show that the proposed algorithm gives more accurate classification rate for ANN classifier. Therefore, ICA + ABC are a promising approach for solving gene selection and cancer classification problems using microarray data.

42 citations

Journal ArticleDOI
TL;DR: The Neural Network and the Naive Bayesian model have been employed as the classification model and the novel feature is combined appearance of adjacent amino acid and the BLOSUM62 matrix.
Abstract: Post-translational modification of protein is one of the most important biological processions in the field of proteomics and bioinformatics. Pupylation is a novel post translational modification which the small, intrinsically disordered prokaryotic ubiquitin-like protein is conjugated to lysine residues of potential segments. Both the experimental and computational prediction methods of such modified sites have proved to be a challenging issue. Computational methods mainly aimed at extracting effective features from the potential protein segments. In this paper, the statistical feature of adjacent amino acid residues has been proposed and the novel feature is combined appearance of adjacent amino acid and the BLOSUM62 matrix. The Neural Network and the Naive Bayesian model have been employed as the classification model in this work. Such model will also be utilised to deal with many other issues in the field of computational biology.

39 citations

Journal ArticleDOI
TL;DR: A new cluster validity index (ARPoints index) for the purpose of cluster validation is proposed and a new approach to determine the compactness measure and distinctness measure of clusters is presented.
Abstract: Elucidating the patterns hidden in gene expression data offers an opportunity for identifying co-expressed genes and biologically relevant grouping of genes. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the microarray data. A first step toward addressing this challenge is the use of clustering techniques. Validation of results obtained from a clustering algorithm is an important part of the clustering process. In this paper, we propose a new cluster validity index (ARPoints index) for the purpose of cluster validation. A new approach to determine the compactness measure and distinctness measure of clusters is presented. We revisit commonly known indices and conduct a thorough comparison of these indices with the proposed index and provide a summary of performance evaluation of different indices. Experimental results show that the proposed index performs better than the commonly known cluster validity indices.

21 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
20231
202228
202012
201912
201827
201742