Wednesday, Aug 6: 10:30 AM - 12:20 PM
4177
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Survey Research Methods Section
Presentations
General(G)-factor latent variable models are used to characterize factor structure of unobserved variables. Bifactor and alternative bifactor models are common such models for G-factor based instruments. Despite their advantages for data structures with a dominant G factor, these models can produce anomalous findings. Thus, alternative bifactor models have been advanced to overcome these problems. There remains insufficient evidence regarding the degree to which such models recover known data structures, especially with model misspecification. This presentation reports a simulation study examining parameter recovery and model estimation convergence for a series of models, including: undimensional, bifactor, and bifactor (S-1) and (SI-1). Specific interest is recovery of the regression coefficient of a binary predictor on the primary dimension, especially in the presence of model misspecification, with a multiple-indicator multiple cause model. Manipulated conditions include: Sample size, factor correlation, number of indicators, and magnitude of group differences on primary and domain-specific latent means. The presentation includes full results and recommendations.
Keywords
Latent variable modeling
Simulation study
Multiple-indicator multiple cause (MIMIC) modeling
G-Factor models
Parameter recovery
Regular dental visits are essential for oral health, yet disparities between regions exist due to socioeconomic and geographic factors. While national surveys provide valuable data on dental care utilization, they often lack sufficient sample sizes to generate reliable county-level estimates. Small area estimation (SAE) techniques help address this gap by producing robust estimates for smaller geographic areas. This study introduces a hybrid approach combining multilevel modeling with the raking procedure to estimate county-level dental care utilization among adults in California. Using Behavioral Risk Factor Surveillance System (BRFSS) and census data, our method accounts for individual- and area-level factors while overcoming data constraints that limit SAE methods like multilevel regression and post-stratification. We validate our estimates by comparing them with BRFSS direct estimates and available county-level estimates from the California Health Interview Survey. The findings demonstrate the feasibility of this approach in generating county-level estimates, supporting public health planning and targeted interventions to reduce disparities in dental care utilization.
Keywords
Small area estimation
Raking
Multilevel regression
Dental care utilization
Co-Author
Honghu Liu, Department of Biostatistics, UCLA
First Author
Yilan Huang, Department of Biostatistics, UCLA
Presenting Author
Yilan Huang, Department of Biostatistics, UCLA
Bayesian statistical methods have become increasingly popular in health and social science research for their intuitive framework together with their ability to accommodate hierarchical data structures and missing data. However, accounting for complex sample design elements such as weights, stratification, and clustering is not straightforward. We propose a novel extension of the finite population Bayesian bootstrap (FPBB) where synthetic populations are generated and posterior draws obtained assuming a simple random sample design are re-weighted using importance sampling. We evaluate our approach through a simulation study of a stratified sample in a misspecified linear modeling setting and compare results to an existing method. Results demonstrate adequate coverage, with only mildly inflated empirical variances. Compared to the other existing method, our approach is computationally faster and produces comparably unbiased estimates and coverage. This simple and generalizable approach will have significant implications for survey data analysts by allowing for implementation of complex Bayesian models while accounting for sampling designs.
Keywords
Bayesian statistics
complex survey design
finite population
Bayesian bootstrap
As technology expands, so do the data entry methods for survey research. Researchers traditionally relied on manual data entry and most recently utilized optical character recognition (OCR) scanning. Manual data entry allows for accurate data, cost and time remain high. In an effort to reduce cost and increase efficiency, NORC moved from manual data entry to OCR scanning for the 2024-25 round of the Reproductive Health Experiences and Access (RHEA) Survey. During 2025, NORC tested AI-based document processing for paper-and-pencil survey responses. We used Azure AI Document Intelligence and Azure OpenAI Large Language Models (LLM) to develop an unsupervised approach to data extraction via Markdown format. We compared the accuracy, efficiency, and costs of the Azure AI approach to those of OCR scanning. This paper will share details about the technical process and provide initial findings related to the time, cost, and accuracy of data entry via these two methods.
Keywords
artificial intelligence
generative AI
optical character recognition
data entry
data entry methods