Thursday, Aug 7: 8:30 AM - 10:20 AM
0855
Topic-Contributed Paper Session
Music City Center
Room: CC-106C
Applied
Yes
Main Sponsor
Privacy and Confidentiality Interest Group
Co Sponsors
Committee on Privacy and Confidentiality
Government Statistics Section
Presentations
It has become increasingly common to encounter data structures that inherently live on non-linear spaces such as manifolds in statistical applications. As such, geodesic regression emerged as an extension of classical linear regression, one of the most fundamental methodologies of statistical learning, where the response variable lives on a Riemannian manifold. %As with linear regression, the learned relationship captured by the parameters leak information of the subjects
The parameters of geodesic regression, as with linear regression, can capture the relationship of sensitive data and hence one should consider the privacy protection practices of said parameters.
We consider releasing Differentially Private (DP) parameters of geodesic regression via the K-Norm Gradient (KNG) mechanism for Riemannian manifolds. We derive theoretical bounds for the sensitivity of the parameters showing they are tied to their respective Jacobi fields and hence the curvature of the space which corroborates recent findings of differential privacy for the Fr\'echet mean. We demonstrate the efficacy of our methodology on a 2D sphere though it is general to Riemannian manifolds making it suitable for data in domains such as medical imaging and computer vision.
Investigating trends in health outcomes at fine geographic levels is crucial for identifying and addressing geographic disparities, but the data necessary for those analyses is often not publicly available due to the potential risk of disclosure of sensitive information of the underlying data subjects. Recent work at the intersection of spatial statistics and data privacy has aimed to develop methods suitable for the production of spatially referenced synthetic data with provable privacy guarantees that can preserve the disparities present in the original data. In this study, we use the differentially private Poisson-gamma mechanism to produce a synthetic dataset comprised of annual tract-level heart disease related death counts stratified by age, race, and sex for the state of Minnesota. We then shift our focus on Minnesota's most populous county and compare an analysis of spatiotemporal trends in heart disease death rates for Hennepin County from 2010-2019 produced by the synthetic data to those based on the true data.
As the use of differential privacy (DP) becomes widespread, the development of effective tools for reasoning about the privacy
guarantee becomes increasingly critical. In pursuit of this goal, we demonstrate novel relationships between DP and measures of
statistical disclosure risk. We suggest how experts and non-experts can use these results to explain the DP guarantee, interpret DP
composition theorems, select and justify privacy parameters, and identify worst-case adversary prior probabilities.
Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. We introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.
In this talk, I will discuss adaptation in the context of estimating a functional of an unknown density under differential privacy constraints. The talk is based on joint work with Tony Cai and Abhinav Chakraborty, in which we study derive theoretical performance upper- and lower-bounds for methods that adapt between different classes, and exhibit differentially private methods that successfully adapt between unknown function classes. Our theory shows that for certain classes of functions, the cost of adaptation can be substantially higher for adaptive differentially private protocols than their non-private counterparts.