Model interpretation after using random projections: An applied study on travel disability data

Keegan Kang Co-Author
Bucknell University
 
Shiya Cao Speaker
Smith College
 
Wednesday, Aug 7: 11:25 AM - 11:50 AM
Invited Paper Session 
Oregon Convention Center 
The National Household Travel Survey (NHTS) asks respondents whether they have a medical condition "that makes it difficult to travel outside of home", which is defined as travel disability in this research. The NHTS allows us to investigate the effects of disability on travel behavior, however, it may release some sensitive medical conditions and travel data. We use a differential privacy algorithm – random projection to get a random dataset that contains the summary statistics of the sample dataset so useful aggregate information can be released and used for the intended purposes, while the privacy of the individuals in the sample dataset is preserved. The main idea of this differential privacy algorithm is to use random projection to project a sample dataset (n by p) to a random dataset (k by p). We fit a linear regression model for the random dataset and compare the statistics of interest of the random dataset with those of the sample dataset. With this differential privacy algorithm, we can examine the accuracy of our random projection compared to the original sample and then make statements about statistics of interest of the true population.