A k nearest neighbour ensemble via extended neighbourhood rule and feature subsets

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/06/2024: 1:20 PM - 1:45 PM EDT
Refereed 

Description

kNN based ensemble methods minimise the effect of outliers by identifying a set of data points in the given feature space that are nearest to an unseen observation in order to predict its response by using majority voting. The ordinary ensembles based on kNN find out the k nearest observations in a region (bounded by a sphere) based on a predefined value of k. This scenario, however, might not work in situations where the test observation follows the pattern of the closest data points with the same class that lie on a certain path not contained in the given sphere. This paper proposes a k nearest neighbour ensemble where the neighbours are determined in k steps. Starting from the first nearest observation of the test point, the algorithm identifies a single observation that is closest to the observation at the previous step. At each base learner in the ensemble, this search is extended to k steps on a random bootstrap sample with a random subset of features selected from the feature space. The final predicted class of the test point is determined by using a majority vote in the predicted classes given by all base models. This new ensemble method is applied on 20 benchmark datasets and compared with other classical methods, including kNN based models, in terms of classification accuracy, kappa and Brier score as performance metrics. Boxplots are also utilised to illustrate the difference in the results given by the proposed and other state-of-the-art methods. The proposed method outperformed the considered classical methods in the majority of cases. The proposed method is further assessed through a detailed simulation study.

Keywords

Features subset

Nearest Neighbours Rule

kNN Ensemble

Classification

Ensemble learning 

Presenting Author

Saeed Aldahmani

First Author

Saeed Aldahmani

CoAuthor(s)

Zardad Khan
Naz Gul, Abdul Wali Khan University
Amjad Ali, United Arab Emirates University

Tracks

Statistical Data Science
Symposium on Data Science and Statistics (SDSS) 2024