Print Close

50 Classification of high-dimensional data using powerful tests and Bayes error rate

Presented During: Contributed Poster Presentations: Section on Statistics in Genomics and Genetics

Mina Aminghafari First Author

Mina Aminghafari Presenting Author

Tuesday, Aug 6: 10:30 AM - 12:20 PM
3892
Contributed Posters

Oregon Convention Center

One of the main aims of data modeling is to find the best classifier for new cases. For example, based on the gene expression of a new case, we can classify it as one of two groups. The high dimensionality of the dataset is the main restriction for finding an accurate and non-complex model. Therefore, the similar genes in the two groups are removed to reduce the dimension. Candidate genes are selected according to the family-wise error rate (FEWR) and used to find the best classifier. Zhang and Deng [1] proposed an additional step in removing the genes with redundant or highly correlated information before finding the best classifier. They find more effective and non-redundant genes using the Bayes error rate (BER). They used Bhattacharya bound to estimate BER because BER was not computable at that time. They show that this additional step improves classification accuracy. In this work, we improve the classification accuracy by computing exact BER [2] and using uniformly most powerful unbiased test [3] for calculating FWER.

Keywords

Bayes error rate

Microarray data

Gene selection

Classification

Permutation test

Uniformly most powerful unbiased test

Abstracts

3892 - Classification of high-dimensional data using powerful tests and Bayes error rate