Print Close

39: A stochastic approach to k-nearest neighbors search using a fixed radius method

Presented During: Contributed Poster Presentations: Section on Statistical Learning and Data Science

Alexander Alsup Co-Author

Jeffrey Thompson Co-Author
Department of Biostatistics and Data Science at KUMC

Devin Koestler Co-Author
University of Kansas Medical Center

Brahian Cano Urrego First Author
University of Kansas Medical Center ASA Student Chapter

Brahian Cano Urrego Presenting Author
University of Kansas Medical Center ASA Student Chapter

Tuesday, Aug 5: 10:30 AM - 12:20 PM
0995
Contributed Posters

Music City Center

This study aims to optimize the k-nearest neighbors search (kNN search) by reducing the computational burden of the Brute-force method while providing the same solution. Our method leverages data structures and probabilistic assumptions to enhance the scalability of the search. By focusing on the Training set, we define a sample space that limits the k-nearest neighbors search to a smaller space. For each observation in the Query set a fixed radius search is employed, with the radius stochastically linked to the desired number of neighbors. This approach allows us to find the k-nearest neighbors using only a fraction of the entire Training set. Through simulations and a theoretical complexity analysis, we demonstrate that our method outperforms the Brute-force approach, particularly when the Training and Query set sample sizes are large. In addition, a benchmark on an Alzheimer's disease data set further demonstrated this, showing a 62.5-fold improvement in CPU time. Overall, our stochastic approach significantly reduces the computational load of kNN search while maintaining accuracy, making it a viable alternative to traditional methods for large datasets.

Keywords

kNN search

Computational efficiency

fixed-radius search

Branch and bound

Machine learning

Main Sponsor

Section on Statistical Learning and Data Science