Wednesday, Aug 6: 2:00 PM - 3:50 PM
4201
Contributed Papers
Music City Center
Room: CC-Davidson Ballroom A2
Main Sponsor
Survey Research Methods Section
Presentations
Propensity score weighting is a common method for estimating treatment effects with survey data. The method is applied to minimize confounding using measured covariates that are often different between individuals in treatment and control. However, existing literature does not reach a consensus on the optimal use of survey weights for population-level inference in the propensity score weighting analysis. Under the balancing weights framework, we provided a unified solution for incorporating survey weights in both the propensity score of estimation and the outcome regression model. We derived estimators for different target populations, including the combined, treated, controlled, and overlap populations. We provide a unified expression of the sandwich variance estimator and demonstrate that the survey-weighted estimator is asymptotically normal, as established through the theory of M-estimators. Through an extensive series of simulation studies, we examined the performance of our derived estimators and compared the results to those of alternative methods. We further carried out two case studies to illustrate the application of the different methods of propensity score analysis.
Keywords
complex survey
propensity score weighting
survey weights
overlap weights
In a finite population sampling survey, model-assisted regression estimation is developed to incorporate the auxiliary information efficiently. When we have high-dimensional auxiliary data sets, adding too many auxiliary variables may increase the estimation error and lead to biased estimation. Particularly under informative sampling, the bias of the high dimensional regression estimator may not be negligible. In this paper, we present a novel application of the sample-split estimation method for regression estimation under informative sampling. The proposed method is shown to be consistent even when the auxiliary variables are high-dimensional, and the sampling design is informative. Variance estimation for the sample-split estimator is discussed. Results from a limited simulation study are also presented.
Keywords
Sample-split estimation
Informative sampling
Model-assisted estimation
High-dimensional regression
The Iterative Proportional Fitting (IPF) algorithm is widely used for survey weighting and synthetic population generation. While efficient in low-dimensional settings, IPF struggles with zero-cell issues in sparse contingency tables and becomes computationally infeasible as dimensionality increases. To address these challenges, we propose a block-wise IPF framework that partitions variables into smaller, correlated feature groups, applying IPF independently within each group. Simulation studies and real-world synthetic population experiments demonstrate that this approach significantly improves computational efficiency and scalability in high-dimensional settings while maintaining a reasonable fit to marginal distributions and preserving inter-variable dependencies comparable to standard IPF. Furthermore, we introduce a hybrid framework that integrates IPF-synthesized data with generative models such as Bayesian networks, and Tabular Variational Autoencoders. This approach ensures accurate marginal fitting while enhancing realism and diversity in synthetic populations. Our contributions improve upon stan-
dard IPF and generative models, advancing synthetic population modeling.
Keywords
Iterative Proportional Fitting (IPF), block-wise IPF, synthetic population generation, high-dimensional data, contingency tables, marginal constraints, scalability
Zero-cell-issues, computational efficiency, survey weighting, generative models, Bayesian networks, tabular variational autoencoders (TVAEs)
There is interest in estimating Consumer Price Indexes (CPI) for small Core-Based Statistical Areas (CBSAs) and states. Currently, consumer prices are sampled in select CBSAs with the goal of providing reliable index estimates at the national-level, Census division-level, and for CBSAs with sufficiently large populations. We use hierarchical Bayesian models and incorporate covariates and spatio-temporal correlations of consumer prices with the idea that accounting for these correlations will compensate for the sparseness of the collected data and will allow for reliable predictions in the small areas. Our research presented at 2024 JSM demonstrated the utility of accounting for spatial correlations. We are currently investigating if a series of temporal estimates in CBSAs will compensate for the sparseness of direct cross-sectional estimates. We check our model assumptions by comparing estimated and predicted fuel prices with estimates from large administrative datasets.
Keywords
small area estimation
hierarchical Bayesian models
spatio-temporal correlations
STAN
Gaussian processes
The increasing availability of survey data for causal inference on treatment effects presents new scopes, yet most methods assume ignorability of treatment and non-informative sampling. In practice, survey data often include survey weights, but the sampling is frequently informative-i.e., dependent on the outcome given covariates and treatment-especially when design details are undisclosed. The optimal use of survey weights for causal inference under such sampling is an open problem. We show how survey weights can enhance the efficiency of Horvitz-Thompson estimators. Specifically, we derive the efficient influence function within the class of regular asymptotically linear estimators and propose a novel estimator based on it. Using a super-population framework, we establish its doubly robust property and, via M-estimation, prove its root-N asymptotic normality under parametric nuisance modeling. To enable flexible ML methods, we extend the theory to show our estimator ensures faster-than-root-N rates when the product of nuisance function rates exceeds root-N. We support our theoretical findings through extensive simulations and analysis of the Medical Expenditure Panel Survey data.
Keywords
Complex survey
Data-Adaptive Method
Doubly Robust Estimation
Empirical process
Population average treatment effect
The Population Assessment of Tobacco and Health (PATH) Study is a national longitudinal study of tobacco use (2013-2021) that requires balanced repeated replicate weights for analysis. Crude and multivariable weighted interval-censoring Cox proportional hazard models were used to estimate two interaction effects (1) sex and years since first hookah use, and (2) ethnicity and years since first hookah use on the age of asthma onset. After controlling for covariates, women, Hispanics and non-Hispanic black adults who reported one or more years since first hookah use had increased risks of asthma onset at earlier ages in comparison to men and non-Hispanic white adults who reported never hookah use (HR= 4.93; 95% CI 2.10-11.58; HR= 5.18; 95% CI 2.21-12.16 and HR=1.63; 95% CI 1.09-2.43, respectively). Also, the interaction of sex and race/ethnicity with past 30-day(P30D) electronic cigarettes (ENDS) use on the age of asthma was estimated. Disseminating the results among health providers and the public about the interaction effect of sex and race/ethnicity with years since the first hookah or P30D ENDS use on earlier ages of asthma onset may encourage users to stop.
Keywords
Sampling weights
Fay's variance estimation
Balanced Repeated Replicate Weights
Interval-Censoring Hazard Function
hazard risk
age of onset
Co-Author
Sarah Valencia, Michael and Susan Dell Center for Healthy Living
First Author
Adriana Perez, University of Texas At Houston, Health Science Center
Presenting Author
Adriana Perez, University of Texas At Houston, Health Science Center