Sunday, Aug 3: 2:00 PM - 3:50 PM
0192
Invited Paper Session
Music City Center
Room: CC-104B
Statistical Reinforcement Learning (SRL), Healthcare and Precision Medicine, Machine Learning and Computer Science, Business and Economics.
Applied
No
Main Sponsor
Section on Statistical Learning and Data Science
Co Sponsors
Caucus for Women in Statistics
International Chinese Statistical Association
JASA Applications and Case Studies
Presentations
Uncontrolled glycated hemoglobin (HbA1c) levels are associated with adverse events among complex diabetic patients. These adverse events present serious health risks to affected patients and are associated with significant financial costs. Thus, a high-quality predictive model that could identify high-risk patients so as to inform preventative treatment has the potential to improve patient outcomes while reducing healthcare costs. Because the biomarker information needed to predict risk is costly and burdensome, it is desirable that such a model collect only as much information as is needed on each patient so as to render an accurate prediction. We propose a sequential predictive model that uses accumulating patient longitudinal data to classify patients as: high-risk, low-risk, or uncertain. Patients classified as high-risk are then recommended to receive preventative treatment and those classified as low-risk are recommended to standard care. Patients classified as uncertain are monitored until a high-risk or low-risk determination is made. We construct the model using claims and enrollment files from Medicare, linked with patient electronic health records (EHR) data. The proposed model uses functional principal components to accommodate noisy longitudinal data and weighting to deal with missingness and sampling bias. The proposed method demonstrates higher predictive accuracy and lower cost than competing methods in a series of simulation experiments and application to data on complex patients with diabetes.
Keywords
Classification with reject option, Cost-effective, Electronic health records, Functional principal component analysis, Reinforcement learning
With the increasing focus on improving personal health and fitness using smart devices and wearables, it is crucial to create a mobile clinical decision support system. In this work, we consider the development of personalized policies that allow different intervention recommendations for individuals with the same observed features. Personalized policy represents a paradigm shift from one decision rule for all users to an individualized decision rule for each user. Aiming to optimize the expected rewards, we propose using a generalized linear mixed modeling framework where population effects and individual deviations from the population effects are modeled as fixed and random effects, respectively, and synthesized to form the personalized policy. We introduce a contextual bandit algorithm to learn the personalized policies. This approach is theoretically justified using a regret bound and illustrated using mobile Apps with the goal of maximizing the push notification response rate given past app usage and other contextual factors.
Keywords
contextual bandits
generalization error bound
In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings, they lack direct applicability to reinforcement learning algorithms. This paper pioneers the study of transfer learning for dynamic decision scenarios modeled by non-stationary finite-horizon Markov decision processes, utilizing neural networks as powerful function approximators and adaptive algorithmic learning. We demonstrate that naive sample pooling strategies, effective in regression settings, fail in Markov decision processes. To address this challenge, we introduce a novel {\it ``re-weighted targeting procedure''} to construct {\it ``transferable RL samples''} and propose {\it ``transfer deep $Q^*$-learning''}, enabling neural network approximation with theoretical guarantees. We assume that the reward functions are transferable and deal with both situations in which the transition density ratios are transferable or nontransferable. Our analytical techniques for transfer learning in neural network approximation and transition probability transfers have broader implications, extending to supervised transfer learning with neural networks and domain shift scenarios. Empirical experiments on both synthetic and real datasets corroborate the advantages of our method, showcasing its potential for improving decision-making through strategically construct transferable RL samples in non-stationary reinforcement learning contexts.
Keywords
Finite-horizon Markov decision processes; Non-stationary; Backward inductive $Q^*$-learning; Transfer learning; Neural network approximation;
Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or ``pooling'' data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the adaptive sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude.
Keywords
Distributional reinforcement learning; regularized Wasserstein loss; multi-dimensional reward;