Tuesday, Aug 5: 8:30 AM - 10:20 AM
4083
Contributed Papers
Music City Center
Room: CC-106B
Main Sponsor
Korean International Statistical Society
Presentations
In healthcare, developing personalized treatment strategies is essential for optimizing patient outcomes, particularly when dealing with censored survival data. This study introduces the Dynamic Deep Buckley-James Q-Learning algorithm, a novel methodology that integrates reinforcement learning with the Buckley-James method to manage censored data effectively. By leveraging deep learning techniques, the algorithm enhances the predictive accuracy of survival times in complex, non-linear settings, optimizing treatment decisions based on imputed outcomes. Our comprehensive simulation study, which includes scenarios with missing at random (MAR), not missing at random (NMAR) data, and right-censoring, demonstrates the algorithm's robust performance. The ability to handle various types of missing and censored data ensures wide applicability across different clinical contexts. By addressing the complexities and challenges associated with censoring and missing data in survival analysis, the algorithm learns policies that maximize the expected total imputed survival reward for patients. This enables the comparison of imputed survival times across different treatments, a feature not possible
Keywords
Deep Learning
Q-Learning
Imputation
Dynamic Treatment Regime
Hawkes processes are commonly used to capture clustered structures in point pattern data, as they allow each event to elevate the chance of subsequent event occurrences. However, this triggering mechanism is difficult to model accurately when spatial information is measured at varying levels of precision. A common strategy is to use only events with the most precise geolocation, but this can lead to both a loss of information and inaccurate estimates of the underlying triggering structure. In this research, we propose a novel framework that retains events with less precise location data by incorporating location-relevant marks as surrogate measures of spatial information. We integrate this surrogate into nonparametric intensity estimation through a modified weighting scheme in the Model-Independent Stochastic Declustering algorithm. Simulation studies verify that the proposed method can recover the triggering structure more accurately than standard approaches. We further illustrate its usefulness with an application to real-world data, demonstrating how the suggested framework can enhance our understanding of space-time clustering by carefully incorporating imprecise events.
Keywords
Spatio-temporal point process
Geolocation error
Two-phase analysis
Terrorism data
Hawkes process
The case-cohort study design provides a cost-effective approach for large cohort studies with competing risks outcomes. The additive subdistribution hazards model assesses direct covariate effects on cumulative incidence when investigating risk differences among different groups instead of relative risk. The presence of left truncation, which commonly occurs in biomedical studies, introduces additional complexities to the analysis.
Existing inverse-probability-weighting methods for case-cohort studies on competing risks are inefficient in parameter estimation of coefficients for baseline covariates. In addition, their methods do not address left truncation.
To improve the efficiency of parameter estimation of coefficients for baseline covariates and account for left-truncated competing risks data, we propose an augmented-inverse-probability-weighted estimating equation for left-truncated competing risks data with additive subdistribution models under the case-cohort study design. For multiple case-cohort studies, we further improve parameter estimation efficiency by incorporating extra information from the other causes. We study large sample properties of the proposed estimator
Keywords
Additive subdistribution hazards model
Case-cohort study design
Competing risks
Efficiency
Left-truncation
Stratified data
First Author
Xi Fang, Yale University
Presenting Author
Soyoung Kim, Medical College of Wisconsin
In this talk, we introduce a new generalized linear model with fractional binomial distribution. Zero-inflated Poisson/negative binomial distributions are used for count data that has many zeros. To analyze the association of such a count variable with covariates, zero-inflated Poisson/negative binomial regression models are widely used. In this work, we develop a regression model with the fractional binomial distribution that can serve as an additional tool for modeling count data with excess zeros. The consistency of the ML estimators is proved under certain conditions, and the performance of the estimators is investigated with simulation results. Applications are provided with datasets from horticulture and public health, and the results show that on some occasions, our model outperforms the existing zero-inflated regression models.
Keywords
Zero-inflated regression models
Count data with excess zeros
Fractional binomial distribution
Co-Author
Chloe Breece, University of North Carolina Wilmington
First Author
Jeonghwa Lee, University of North Carolina Wilmington, USA
Presenting Author
Jeonghwa Lee, University of North Carolina Wilmington, USA
Matching in observational studies estimates causal effects by balancing covariate distributions between treated and control groups. Traditional methods rely on pairwise distances, but in high-dimensional, low-sample size settings, the curse of dimensionality makes it difficult to distinguish observations. To address this, we propose a novel matching method using genetic algorithms, shifting focus from individual- to group-level distances. Our method improves causal effect estimation by optimizing the similarity of high-dimensional joint covariate distributions. This approach has key advantages: (1) it avoids dimension reduction, preserving full covariate information without additional modeling; (2) it maintains transparency by not relying on outcomes, akin to traditional matching; and (3) it is robust in low-sample size settings, where traditional methods may struggle. Moreover, our results show the proposed method is competitive with existing approaches even in low-dimensional cases. Through simulations and real data applications, we validate its performance, offer practical guidance, and highlight its potential as a tool for causal inference in high- and low-dimensional settings.
Keywords
Matching
High-dimensional data
Genetic algorithms
Covariate balance
Low-sample size settings
Causal inference
Erlang mixture models are essential tools for modeling insurance losses and evaluating aggregated risk measures. However, finding the maximum likelihood estimate (MLE) of Erlang mixtures is challenging due to the discrete nature of the parameter space for shape. This discreteness complicates the application of the standard expectation-maximization (EM) algorithm, which is commonly used in mixture models. Although alternative algorithms have been proposed to compute the MLE of Erlang mixtures, they are often restricted to parametric models and tend to converge to local maxima of the likelihood function. In this study, we focus on the nonparametric Erlang mixture model which offers greater flexibility compared to parametric models, and introduce an algorithm to estimate the nonparametric maximum likelihood estimate (NPMLE) of Erlang mixtures. By exploiting the gradient function, this method efficiently identifies critical support points, enhancing the likelihood of finding the global maximizer. Numerical studies demonstrate that our approach provides more stable and accurate performance in estimating the MLE for Erlang mixture models compared to existing methods.
Keywords
Erlang mixtures
nonparametric mixtures
NPMLE
gradient function
EM algorithm