Maintaining Privacy in Increasingly Public Societies

Jihyeon Kwon Chair
 
Maryclare Griffin Organizer
 
Thursday, Aug 7: 8:30 AM - 10:20 AM
0855 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-106C 

Applied

Yes

Main Sponsor

Privacy and Confidentiality Interest Group

Co Sponsors

Committee on Privacy and Confidentiality
Government Statistics Section

Presentations

Differentially Private Geodesic Regression

It has become increasingly common to encounter data structures that inherently live on non-linear spaces such as manifolds in statistical applications. As such, geodesic regression emerged as an extension of classical linear regression, one of the most fundamental methodologies of statistical learning, where the response variable lives on a Riemannian manifold. %As with linear regression, the learned relationship captured by the parameters leak information of the subjects
The parameters of geodesic regression, as with linear regression, can capture the relationship of sensitive data and hence one should consider the privacy protection practices of said parameters.
We consider releasing Differentially Private (DP) parameters of geodesic regression via the K-Norm Gradient (KNG) mechanism for Riemannian manifolds. We derive theoretical bounds for the sensitivity of the parameters showing they are tied to their respective Jacobi fields and hence the curvature of the space which corroborates recent findings of differential privacy for the Fr\'echet mean. We demonstrate the efficacy of our methodology on a 2D sphere though it is general to Riemannian manifolds making it suitable for data in domains such as medical imaging and computer vision. 

Co-Author(s)

Aditya Kulkarni, University of Massachusetts Amherst
Carlos Soto, UMass Amherst

Speaker

Carlos Soto, UMass Amherst

Evaluating Within-county Disparities in Health Outcomes using Synthetic Data

Investigating trends in health outcomes at fine geographic levels is crucial for identifying and addressing geographic disparities, but the data necessary for those analyses is often not publicly available due to the potential risk of disclosure of sensitive information of the underlying data subjects. Recent work at the intersection of spatial statistics and data privacy has aimed to develop methods suitable for the production of spatially referenced synthetic data with provable privacy guarantees that can preserve the disparities present in the original data. In this study, we use the differentially private Poisson-gamma mechanism to produce a synthetic dataset comprised of annual tract-level heart disease related death counts stratified by age, race, and sex for the state of Minnesota. We then shift our focus on Minnesota's most populous county and compare an analysis of spatiotemporal trends in heart disease death rates for Hennepin County from 2010-2019 produced by the synthetic data to those based on the true data. 

Speaker

Harrison Quick, University of Minnesota

Interpreting Differential Privacy in Terms of Disclosure Risk

As the use of differential privacy (DP) becomes widespread, the development of effective tools for reasoning about the privacy
guarantee becomes increasingly critical. In pursuit of this goal, we demonstrate novel relationships between DP and measures of
statistical disclosure risk. We suggest how experts and non-experts can use these results to explain the DP guarantee, interpret DP
composition theorems, select and justify privacy parameters, and identify worst-case adversary prior probabilities. 

Speaker

Zeki Kazan, Duke University

Private Regression via Data-Dependent Sufficient Statistic Perturbation

Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. We introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries. 

Co-Author

Daniel Sheldon, University of Massachusetts Amherst

Speaker

Cecilia Ferrando

The Cost of Adaptation under Differential Privacy

In this talk, I will discuss adaptation in the context of estimating a functional of an unknown density under differential privacy constraints. The talk is based on joint work with Tony Cai and Abhinav Chakraborty, in which we study derive theoretical performance upper- and lower-bounds for methods that adapt between different classes, and exhibit differentially private methods that successfully adapt between unknown function classes. Our theory shows that for certain classes of functions, the cost of adaptation can be substantially higher for adaptive differentially private protocols than their non-private counterparts. 

Speaker

Lasse Vuursteen, University of Pennsylvania