39: A stochastic approach to k-nearest neighbors search using a fixed radius method

Alexander Alsup Co-Author
 
Jeffrey Thompson Co-Author
Department of Biostatistics and Data Science at KUMC
 
Devin Koestler Co-Author
University of Kansas Medical Center
 
Brahian Cano Urrego First Author
University of Kansas Medical Center ASA Student Chapter
 
Brahian Cano Urrego Presenting Author
University of Kansas Medical Center ASA Student Chapter
 
Tuesday, Aug 5: 10:30 AM - 12:20 PM
0995 
Contributed Posters 
Music City Center 
This study aims to optimize the k-nearest neighbors search (kNN search) by reducing the computational burden of the Brute-force method while providing the same solution. Our method leverages data structures and probabilistic assumptions to enhance the scalability of the search. By focusing on the Training set, we define a sample space that limits the k-nearest neighbors search to a smaller space. For each observation in the Query set a fixed radius search is employed, with the radius stochastically linked to the desired number of neighbors. This approach allows us to find the k-nearest neighbors using only a fraction of the entire Training set. Through simulations and a theoretical complexity analysis, we demonstrate that our method outperforms the Brute-force approach, particularly when the Training and Query set sample sizes are large. In addition, a benchmark on an Alzheimer's disease data set further demonstrated this, showing a 62.5-fold improvement in CPU time. Overall, our stochastic approach significantly reduces the computational load of kNN search while maintaining accuracy, making it a viable alternative to traditional methods for large datasets.

Keywords

kNN search

Computational efficiency

fixed-radius search

Branch and bound

Machine learning 

Main Sponsor

Section on Statistical Learning and Data Science