Nonlinear outlier detection leveraging high dimensionality of kernel-induced feature space

Jeongyoun Ahn Co-Author
Korea Advanced Institute of Science and Technology
 
Giwon Kim First Author
 
Giwon Kim Presenting Author
 
Tuesday, Aug 6: 9:35 AM - 9:40 AM
2529 
Contributed Speed 
Oregon Convention Center 
Classical linear methods frequently exhibit limited effectiveness for detecting outliers in real-world datasets. To overcome this limitation, we propose a nonlinear outlier detection method that exploits the high dimensionality of kernel-induced feature space. When the data dimension exceeds the sample size, we can calculate the orthogonal distance between each data point and the hyperplane spanned by the other data points. We demonstrate that we can calculate DH (Distance to Hyperplane) in kernel-induced space (kernelized DH) by treating the induced space as a high-dimensional space. Utilizing kernelized DH as a measure of outlyingness, we conduct a permutation test using kernelized DH as a test statistic to determine whether each data point is an outlier or not. Since the model uses only the kernel matrix, we can detect outliers in various data types for which an appropriate kernel can be defined. The experimental results, based on simulated data and real datasets, demonstrate the competitive performance of the proposed method.

Keywords

Outlier detection

Distance to hyperplane

Kernel method

Kernel-induced feature space 

Main Sponsor

Section on Statistical Learning and Data Science