Monday, Aug 4: 8:30 AM - 10:20 AM
4033
Contributed Papers
Music City Center
Room: CC-212
Main Sponsor
Section on Teaching of Statistics in the Health Sciences
Presentations
Homoscedasticity is a key assumption in traditional linear models, but it is frequently violated in real-world data, leading to biased coefficient estimates and incorrect inference. This paper introduces variance-guided (VarGuid) regression, a method designed to improve coefficient estimation and uncertainty quantification under heteroscedastic conditions. VarGuid employs an iteratively reweighted least squares approach that integrates data-adaptive weights to account for varying conditional variance structures. We derive the maximum likelihood estimator for the model parameters and demonstrate its theoretical properties. Through simulation studies, we demonstrate that VarGuid provides more accurate coefficient estimates and better confidence interval coverage compared to Ordinary Least Squares in the presence of heteroscedasticity. Additionally, we apply VarGuid to a real-world dataset analyzing factors associated with respiratory-related quality of life in low- and middle-income countries. Our findings highlight the advantages of VarGuid in improving estimation accuracy and inference reliability for heteroscedastic data.
Keywords
Heteroscedasticity
Maximum likelihood estimation
Nonlinearity
Learning rates may depend on the circumstances and the time, while incentivization with punishments or rewards may affect human skill learning. We consider a state space model for dynamically changed learning rates and figure out the effect of incentivization on the learning rates by utilizing the dynamically weighted particle filter. Alternatively, we consider a functional data analysis for the learning rates and the effect of incentivization. We present the estimated learning rates and the effect of incentivization on the learning rates from two approaches, as well as the comparisons of their results.
Keywords
sequential data analysis
functional data analysis
Bayesian data analysis
Bayesian statistics
The art and science of research demand a dynamic confluence of curiosity, rigor, and resilience. Over the past five years, I have dedicated my efforts to fostering a thriving research culture at a private university through one-hour presentations known as Research Forum. This endeavor was guided by a commitment to equip students, faculty, and emerging researchers with the tools and confidence needed to excel in their scholarly pursuits. Ten topics a year are presented by myself and invited researchers. Topics have included: writing research questions, sampling methods to increase sample size, critically reading research articles, navigating ethical issues, Imposter Syndrome, survey instruments, reliability, and validity, data visualization, creating APA tables, surviving graduate school as a non-traditional student, and presenting scientific findings. In conclusion, this five-year journey demonstrates the power of a holistic approach to cultivating a research environment. As I hope to present these findings at the Joint Statistical Meeting in 2025, I intend to inspire others to embark on similar journeys, fostering vibrant research cultures within their institutions.
Keywords
research in higher education
mentoring researchers
statistical consulting
graduate school
imposter syndrome
Effective connectivity (EC) research investigates whether and to what extent functional activity in one brain region causally influences another. Recent studies have shown a growing interest in the effects of external intervention on subject-level connectivity. This introduces two layers of causal inference problems: causal relationships among brain regions and the effect of an external intervention on those relationships. Each layer is susceptible to distinct or shared confounding factors. To address confounding in estimating EC, we propose using a sample splitting method for time-series data and then fitting a vector autoregressive model. With the estimated EC, we develop an inverse probability weighting estimator to examine the intervention effect on EC while adjusting for subject-level confounding and multiple testing. We demonstrate, both in theory and simulations, that the proposed method is asymptotically valid under certain conditions, effectively controlling type-I error rates and familywise error rates. We apply this approach to resting-state fMRI data from the Alzheimer's Disease Neuroimaging Initiative.
Keywords
causal inference
effective connectivity
fMRI
Alzheimer’s disease
There have been speculations on the relationship between neighborhood characteristics and health literacy, and the relationships might vary across racial/ethnic groups. The study used the 2023 Survey of Racism and Public Health dataset linked with the 2017-2021 American Community Survey dataset using zip codes to obtain measures of residential segregation, neighborhood deprivation, racial and economic polarization, and racial and educational isolation. Also, the Brief Health Literacy Screen was used to assess participants' health literacy. Associations between neighborhood characteristics and limited health literacy were explored. Greater neighborhood deprivation is associated with a higher likelihood of limited health literacy. Higher racial and economic polarization is associated with decreased odds of limited health literacy. Racial isolation seems to increase with the odds of limited health literacy. These associations did not significantly vary by racial/ethnic groups. These help reveal potential critical causal pathways; higher neighborhood deprivation increases the likelihood of limited health literacy and no variation across the racial/ethnic groups within the same neighborhood.
Keywords
Structural racism
Health inequities
Neighborhood deprivation
Marginalized populations
Environmental justice
limited health literacy
Integrating proteomics and clinical data shows great promise for early, precise disease prediction and diagnosis. However, the high dimensionality and small sample sizes of proteomics data pose challenges for machine learning in identifying relevant features. While various statistical methods and pipelines are available, their efficiency, reproducibility, and clinical relevance remain unclear.
This study evaluated nine analysis pipelines using machine learning and dimensionality reduction methods on simulated data of 1317 proteins from 26 subjects (13 controls, 13 cases). With extremely small sample sizes (n < 30), all pipelines showed high performance metrics, indicating potential overfitting. Although performance metrics were similar, the proteins identified as discriminatory varied across methods. Despite this heterogeneity, their biological pathways and genetic disorders overlapped. Sensitivity analysis showed that larger sample sizes improved biomarker stability.
While most pipelines perform similarly in distinguishing cohort groups and identifying shared pathways, meticulous model selection is needed to ensure reliable protein identification for downstream studies.
Keywords
Machine learning
Proteomics data
Performance metrics
Small sample sizes