Sunday, Aug 3: 4:00 PM - 5:50 PM
4017
Contributed Papers
Music City Center
Room: CC-207A
In this session, presenters will showcase a wide variety of novel techniques for handling categorical data or longitudinal/correlated data and these applications to various research areas.
Main Sponsor
Biometrics Section
Presentations
According to the World Health Organization (WHO), 25% of children with health problems under age five are related to environmental risk factors. The percentage for preterm birth is considered to be even higher, where it is the leading cause of death in children within the same age group. There were an estimated 900,000 deaths reported worldwide in 2019 in relation to preterm birth complications. This project studies the association between the number of preterm births at the county level in North Dakota and South Dakota with water and air pollution variables, building upon studies that have researched the association between them and preterm births individually. The preterm birth and birth defect occurrences of less than three (SD) and five (ND) are removed from data due to privacy concerns which lead to employing a truncated Poisson regression model. Furthermore, the Bayesian approach has been used for parameter estimation to allow for appropriate uncertainty characterization. Results indicate state, residential county, various behavioral variables, and specific water and air pollutants as significant predictors.
Keywords
Truncated Poisson regression
Bayesian modeling
When modeling clustered data using generalized estimating equations, the selection of a proper correlation structure improves the efficiency of mean structure estimators. QIC and CIC are measures that can be used to perform working correlation structure selection. Both criteria assess the disparity between the robust estimator of the covariance matrix for the estimated mean parameters and a referent: specifically, the model-based covariance matrix estimator arising from the independence model. Such a referent is arguably suboptimal, since the independence working structure is usually inappropriate for clustered data. To address this issue, we propose new discrepancy measures that utilize the general working correlation structure as the referent, which should always be defensible provided that the correlation parameters can be accurately estimated. To facilitate the selection of a suitably parsimonious working correlation structure, we develop and implement a form of Occam's window based on bootstrapping that can be used in conjunction with the criteria.
Keywords
bootstrapping
CIC
generalized estimating equations
model selection
Occam’s window
working correlation structure
Multiple comparative trials with binary outcomes are commonly used in biomedical research and other disciplines for estimating the epidemiologic measures by combining the information from multiple comparative trials. The epidemiologic measures are commonly estimated as a weighted average of summary statistics based on the 2 × 2 table data from each trial. Three of the most important epidemiologic measures are frequently used: the risk ratio (RR), odds ratio (OR), and risk difference (RD). The RD/RR are preferable due to a more meaningful and interpretable treatment measure for binary outcome. The estimation procedures for estimating the overall RD/RR in multiple comparative trials with binary outcomes are very challenging and difficult, especially when the number of patients in a single trial is small and when the number of events is zero for some trials. Considering the above situations, we develop some efficient estimation procedures for estimating the overall RD/RR in multiple comparative trials with binary outcomes. We illustrate those estimation procedures by analyzing two real-life data sets obtained from multiple comparative trials in biomedical research.
Keywords
Multiple comparative trials
Epidemiologic measures
risk difference
risk ratio
estimation procedures
A confidence interval is unbiased if the probability of covering the true parameter is no less than the probability of false coverage. In the binomial distribution, a nonrandom confidence interval for a binomial proportion may not be unbiased, but it can satisfy local unbiasedness within specific regions of the parameter space. In this study, we propose a method to determine these regions of local unbiasedness. By applying this methodology, we either confirm the unbiasedness of existing confidence intervals or identify the regions where local unbiasedness holds. Additionally, we define the locally unbiased ratio as the total length of these regions divided by the length of the parameter space. Using the locally unbiased ratio as a criterion, we compare the performance of existing intervals and provide recommendations based on our findings.
Keywords
Binomial distribution
Confidence interval
Coverage probability
Locally unbiased
Probability of false coverage
Multivariate multinomial outcomes are often interdependent, yet most existing research on multinomial regression fits each outcome separately. This approach ignores correlations between outcomes, leading to loss of information and reduced predictive accuracy. Accounting for these correlations requires high-dimensional parameter spaces, making model estimation infeasible. This study proposes a multivariate multinomial logit model that captures outcome correlations and reduces parameter space dimension using ANOVA decomposition. The ANOVA decomposition enables explicit conditional model formulations, which allows a computationally much simpler composite likelihood approach. Then an efficient Minorization-Maximization (MM) algorithm that incorporates variable selection is developed. Simulation studies evaluate our method, demonstrating its effectiveness in parameter estimation and variable selection. The model is also applied to real-world data, revealing the correlation structure of multinomial choices. Our method outperforms existing approaches in predicting outcomes, offering significant advantages for predictive modeling and decision-making.
Keywords
Multivariate analysis
Multinomial Logit
Composite Likelihood
ANOVA Decomposition
Correlated Outcomes
Variable Selection
Despite the explosive growth of literature on joint models to correlate longitudinal and time-to-event data, efficient implementation of jointly modeling multiple biomarkers and time-to-event outcome has lagged behind, and their current implementations do not scale to large datasets with tens of thousands to millions of subjects. To address this, we propose a fast approximate expectation-maximization (EM) algorithm for a semiparametric joint model that handles multiple biomarkers and competing risks time-to-event outcome. The fast approximate EM algorithm utilizes both customized linear scan algorithms and a normal approximation of the posterior distribution of random effects, significantly reducing the computational burdens by a factor of up to hundreds of thousands compared to the existing approaches, often reducing the runtime from days to minutes. We validate the accuracy and efficiency of our approximation method through various simulation studies and further demonstrate its practical applications by using a real world large-scale Biobank study.
Keywords
competing risks
massive data
multiple biomarkers
normal approximation
scalable joint models
Analyzing environmental data can be challenging when making predictions due to outliers and other irregularities in the data. Large environmental datasets often contain measurements that deviate from the norm, and these outliers can significantly distort traditional analyses, potentially leading to biased or invalid results. As a result, identifying and addressing outliers is essential. Robust methods can produce reliable results even when the data has skewed, heavy-tailed, or non-normal distributions. These methods provide dependable parameter estimates despite the presence of anomalies, leading to more trustworthy conclusions and decisions.
In this study, we assume that covariates in a Poisson regression model are non-stochastic, which allows for the inclusion of non-normality and extreme values in the model's systematic component, as commonly found in environmental data. We propose a novel estimation method and compare the performance of our proposed estimators with traditional techniques, demonstrating that the new estimators are indeed robust. Finally, we apply these estimators to a real-life dataset.
Keywords
Outliers
Robustness
Poisson Regression
Stochastic covariates
First Author
Evrim Oral, LSUHSC School of Public Health
Presenting Author
Evrim Oral, LSUHSC School of Public Health