50 Classification of high-dimensional data using powerful tests and Bayes error rate

Mina Aminghafari First Author
 
Mina Aminghafari Presenting Author
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
3892 
Contributed Posters 
Oregon Convention Center 
One of the main aims of data modeling is to find the best classifier for new cases. For example, based on the gene expression of a new case, we can classify it as one of two groups. The high dimensionality of the dataset is the main restriction for finding an accurate and non-complex model. Therefore, the similar genes in the two groups are removed to reduce the dimension. Candidate genes are selected according to the family-wise error rate (FEWR) and used to find the best classifier. Zhang and Deng [1] proposed an additional step in removing the genes with redundant or highly correlated information before finding the best classifier. They find more effective and non-redundant genes using the Bayes error rate (BER). They used Bhattacharya bound to estimate BER because BER was not computable at that time. They show that this additional step improves classification accuracy. In this work, we improve the classification accuracy by computing exact BER [2] and using uniformly most powerful unbiased test [3] for calculating FWER.

Keywords

Bayes error rate

Microarray data

Gene selection

Classification

Permutation test

Uniformly most powerful unbiased test 

Abstracts