Categorical Data and Longitudinal/Correlated Data Current Research

Fanyu Cui Chair
Columbia University
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
4017 
Contributed Papers 
Music City Center 
Room: CC-207A 
In this session, presenters will showcase a wide variety of novel techniques for handling categorical data or longitudinal/correlated data and these applications to various research areas.

Main Sponsor

Biometrics Section

Presentations

Association between environmental risk factors and preterm birth defects.

According to the World Health Organization (WHO), 25% of children with health problems under age five are related to environmental risk factors. The percentage for preterm birth is considered to be even higher, where it is the leading cause of death in children within the same age group. There were an estimated 900,000 deaths reported worldwide in 2019 in relation to preterm birth complications. This project studies the association between the number of preterm births at the county level in North Dakota and South Dakota with water and air pollution variables, building upon studies that have researched the association between them and preterm births individually. The preterm birth and birth defect occurrences of less than three (SD) and five (ND) are removed from data due to privacy concerns which lead to employing a truncated Poisson regression model. Furthermore, the Bayesian approach has been used for parameter estimation to allow for appropriate uncertainty characterization. Results indicate state, residential county, various behavioral variables, and specific water and air pollutants as significant predictors. 

Keywords

Truncated Poisson regression

Bayesian modeling 

Co-Author

Hossein Moradi Rekabdarkolaee, South Dakota State University

First Author

Emma Brookman, South Dakota State University

Presenting Author

Emma Brookman, South Dakota State University

Enhanced Working Correlation Structure Selection for the Modeling of Clustered Data using GEEs

When modeling clustered data using generalized estimating equations, the selection of a proper correlation structure improves the efficiency of mean structure estimators. QIC and CIC are measures that can be used to perform working correlation structure selection. Both criteria assess the disparity between the robust estimator of the covariance matrix for the estimated mean parameters and a referent: specifically, the model-based covariance matrix estimator arising from the independence model. Such a referent is arguably suboptimal, since the independence working structure is usually inappropriate for clustered data. To address this issue, we propose new discrepancy measures that utilize the general working correlation structure as the referent, which should always be defensible provided that the correlation parameters can be accurately estimated. To facilitate the selection of a suitably parsimonious working correlation structure, we develop and implement a form of Occam's window based on bootstrapping that can be used in conjunction with the criteria. 

Keywords

bootstrapping

CIC

generalized estimating equations

model selection

Occam’s window

working correlation structure 

Co-Author

Joseph Cavanaugh, University of Iowa

First Author

Daniel Boonstra, University of Iowa

Presenting Author

Daniel Boonstra, University of Iowa

Estimation of Some Epidemiological Measures of Association in Multiple Comparative Trials

Multiple comparative trials with binary outcomes are commonly used in biomedical research and other disciplines for estimating the epidemiologic measures by combining the information from multiple comparative trials. The epidemiologic measures are commonly estimated as a weighted average of summary statistics based on the 2 × 2 table data from each trial. Three of the most important epidemiologic measures are frequently used: the risk ratio (RR), odds ratio (OR), and risk difference (RD). The RD/RR are preferable due to a more meaningful and interpretable treatment measure for binary outcome. The estimation procedures for estimating the overall RD/RR in multiple comparative trials with binary outcomes are very challenging and difficult, especially when the number of patients in a single trial is small and when the number of events is zero for some trials. Considering the above situations, we develop some efficient estimation procedures for estimating the overall RD/RR in multiple comparative trials with binary outcomes. We illustrate those estimation procedures by analyzing two real-life data sets obtained from multiple comparative trials in biomedical research. 

Keywords

Multiple comparative trials

Epidemiologic measures

risk difference

risk ratio

estimation procedures 

Co-Author

Soumik Banerjee, Canisius University

First Author

Krishna Saha, Central Connecticut State University

Presenting Author

Krishna Saha, Central Connecticut State University

Local Unbiasedness of Confidence Intervals for a Binomial Proportion

A confidence interval is unbiased if the probability of covering the true parameter is no less than the probability of false coverage. In the binomial distribution, a nonrandom confidence interval for a binomial proportion may not be unbiased, but it can satisfy local unbiasedness within specific regions of the parameter space. In this study, we propose a method to determine these regions of local unbiasedness. By applying this methodology, we either confirm the unbiasedness of existing confidence intervals or identify the regions where local unbiasedness holds. Additionally, we define the locally unbiased ratio as the total length of these regions divided by the length of the parameter space. Using the locally unbiased ratio as a criterion, we compare the performance of existing intervals and provide recommendations based on our findings. 

Keywords

Binomial distribution

Confidence interval

Coverage probability

Locally unbiased

Probability of false coverage 

Co-Author

Zong-Lin Lin, National Yang Ming Chiao Tung University

First Author

Chung-Han Lee, National Cheng Kung University

Presenting Author

Chung-Han Lee, National Cheng Kung University

Multivariate Multinomial Logit Model with ANOVA Decomposition for Correlated Outcomes

Multivariate multinomial outcomes are often interdependent, yet most existing research on multinomial regression fits each outcome separately. This approach ignores correlations between outcomes, leading to loss of information and reduced predictive accuracy. Accounting for these correlations requires high-dimensional parameter spaces, making model estimation infeasible. This study proposes a multivariate multinomial logit model that captures outcome correlations and reduces parameter space dimension using ANOVA decomposition. The ANOVA decomposition enables explicit conditional model formulations, which allows a computationally much simpler composite likelihood approach. Then an efficient Minorization-Maximization (MM) algorithm that incorporates variable selection is developed. Simulation studies evaluate our method, demonstrating its effectiveness in parameter estimation and variable selection. The model is also applied to real-world data, revealing the correlation structure of multinomial choices. Our method outperforms existing approaches in predicting outcomes, offering significant advantages for predictive modeling and decision-making. 

Keywords

Multivariate analysis

Multinomial Logit

Composite Likelihood

ANOVA Decomposition

Correlated Outcomes

Variable Selection 

Co-Author(s)

Wenbin Lu, North Carolina State University
Luo Xiao
Xinming An, UNC-Chapel Hill

First Author

Sohyeon Kim, North Carolina State University

Presenting Author

Sohyeon Kim, North Carolina State University

Scalable Joint Modeling of Multiple Biomarkers and Survival Outcomes for Massive Biobank Data

Despite the explosive growth of literature on joint models to correlate longitudinal and time-to-event data, efficient implementation of jointly modeling multiple biomarkers and time-to-event outcome has lagged behind, and their current implementations do not scale to large datasets with tens of thousands to millions of subjects. To address this, we propose a fast approximate expectation-maximization (EM) algorithm for a semiparametric joint model that handles multiple biomarkers and competing risks time-to-event outcome. The fast approximate EM algorithm utilizes both customized linear scan algorithms and a normal approximation of the posterior distribution of random effects, significantly reducing the computational burdens by a factor of up to hundreds of thousands compared to the existing approaches, often reducing the runtime from days to minutes. We validate the accuracy and efficiency of our approximation method through various simulation studies and further demonstrate its practical applications by using a real world large-scale Biobank study. 

Keywords

competing risks

massive data

multiple biomarkers

normal approximation

scalable joint models 

Co-Author(s)

Emily Ouyang, University of California, Riverside
Jin Zhou, UCLA
Xinping Cui, University of California-Riverside
Gang Li, University of California-Los Angeles

First Author

Shanpeng Li, City of Hope

Presenting Author

Emily Ouyang, University of California, Riverside

Stochastic Covariates in Poisson Regression

Analyzing environmental data can be challenging when making predictions due to outliers and other irregularities in the data. Large environmental datasets often contain measurements that deviate from the norm, and these outliers can significantly distort traditional analyses, potentially leading to biased or invalid results. As a result, identifying and addressing outliers is essential. Robust methods can produce reliable results even when the data has skewed, heavy-tailed, or non-normal distributions. These methods provide dependable parameter estimates despite the presence of anomalies, leading to more trustworthy conclusions and decisions.
In this study, we assume that covariates in a Poisson regression model are non-stochastic, which allows for the inclusion of non-normality and extreme values in the model's systematic component, as commonly found in environmental data. We propose a novel estimation method and compare the performance of our proposed estimators with traditional techniques, demonstrating that the new estimators are indeed robust. Finally, we apply these estimators to a real-life dataset. 

Keywords

Outliers

Robustness

Poisson Regression

Stochastic covariates 

First Author

Evrim Oral, LSUHSC School of Public Health

Presenting Author

Evrim Oral, LSUHSC School of Public Health