Monte Carlo feature selection for supervised classification
- PMID: 18048398
- DOI: 10.1093/bioinformatics/btm486
Monte Carlo feature selection for supervised classification
Abstract
Motivation: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features.
Results: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods.
Availability: Prototype available upon request.
Similar articles
-
Structured polychotomous machine diagnosis of multiple cancer types using gene expression.Bioinformatics. 2006 Apr 15;22(8):950-8. doi: 10.1093/bioinformatics/btl029. Epub 2006 Feb 1. Bioinformatics. 2006. PMID: 16452113
-
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18. Bioinformatics. 2006. PMID: 16709589
-
Classification of microarray data with factor mixture models.Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15. Bioinformatics. 2006. PMID: 16287938
-
Machine learning methods for predictive proteomics.Brief Bioinform. 2008 Mar;9(2):119-28. doi: 10.1093/bib/bbn008. Epub 2008 Feb 29. Brief Bioinform. 2008. PMID: 18310105 Review.
-
A review of feature selection techniques in bioinformatics.Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24. Bioinformatics. 2007. PMID: 17720704 Review.
Cited by
-
Patterns of Gene Expression Profiles Associated with Colorectal Cancer in Colorectal Mucosa by Using Machine Learning Methods.Comb Chem High Throughput Screen. 2024;27(19):2921-2934. doi: 10.2174/0113862073266300231026103844. Comb Chem High Throughput Screen. 2024. PMID: 37957897
-
Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.PeerJ Comput Sci. 2021 Jun 1;7:e562. doi: 10.7717/peerj-cs.562. eCollection 2021. PeerJ Comput Sci. 2021. PMID: 34141889 Free PMC article.
-
Identifying Key MicroRNA Signatures for Neurodegenerative Diseases With Machine Learning Methods.Front Genet. 2022 Apr 21;13:880997. doi: 10.3389/fgene.2022.880997. eCollection 2022. Front Genet. 2022. PMID: 35528544 Free PMC article.
-
Identifying Methylation Signatures and Rules for COVID-19 With Machine Learning Methods.Front Mol Biosci. 2022 May 10;9:908080. doi: 10.3389/fmolb.2022.908080. eCollection 2022. Front Mol Biosci. 2022. PMID: 35620480 Free PMC article.
-
Exome sequencing to explore the possibility of predicting genetic susceptibility to the joint occurrence of polycystic ovary syndrome and Hashimoto's thyroiditis.Front Immunol. 2023 Jul 20;14:1193293. doi: 10.3389/fimmu.2023.1193293. eCollection 2023. Front Immunol. 2023. PMID: 37545519 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources