Print Close

Recent Advances in Statistical Learning for Optimal Decision-Making with Applications

Runmin Wang Chair
Texas A&M University

Zifeng Zhao Organizer
University of Notre Dame

Runmin Wang Organizer
Texas A&M University

Sunday, Aug 3: 2:00 PM - 3:50 PM
0119
Invited Paper Session

Music City Center

Room: CC-101D

Keywords

Online Learning

Applied

Main Sponsor

Business and Economic Statistics Section

Co Sponsors

IMS

Section on Statistical Learning and Data Science

Presentations

Personalized Reinforcement Learning with Applications to Recommender System

Reinforcement learning (RL) has achieved remarkable success across various domains; however, its applicability is often hampered by challenges in practicality and interpretability. Many real-world applications, such as in healthcare and business settings, have large and/or continuous state and action spaces and demand personalized solutions. In addition, the interpretability of the model is crucial to decision-makers so as to guide their decision-making process while incorporating their domain knowledge. To bridge this gap, we propose a personalized reinforcement learning framework that integrates personalized information into the state-transition and reward-generating mechanisms. We develop an online RL algorithm for our framework. Specifically, our algorithm learns the embeddings of the personalized state-transition distribution in a Reproducing Kernel Hilbert Space (RKHS) by balancing the exploitation-exploration tradeoff. We further provide the regret bound of the algorithm and demonstrate its effectiveness in recommender systems.

Keywords

reinforcement learning

Speaker

Linda Zhao, University of Pennsylvania

Latent Agents in Networks: Estimation and Targeting

We consider a platform that serves (observable) agents, who belong to a larger network that also includes additional agents who are not served by the platform. We refer to the latter group of agents as latent agents. Associated with each agent are the agent's covariate and outcome. The platform has access to past covariates and outcomes of the observable agents, but no data for the latent agents is available to the platform. Crucially, the agents influence each other's outcome through a certain influence structure. In particular, observable agents influence each other both directly and indirectly through the influence they exert on the latent agents. The platform doesn't know the inference structure of either the observable or the latent parts of the network. We investigate how the platform can estimate the dependence of the observable agents' outcomes on their covariates, taking the presence of the latent agents into account. First, we show that a certain matrix succinctly captures the relationship between the outcomes and the covariates. We provide an algorithm that estimates this matrix using historical data of covariates and outcomes for the observable agents under a suitable approximate sparsity condition. We also establish convergence rates for the proposed estimator despite the high dimensionality that allows more agents than observations. Second, we show that the approximate sparsity condition holds under the standard conditions used in the literature. Hence, our results apply to a large class of networks. Finally, we illustrate the applications to a targeted advertising problem. We show that, by using the available historical data with our estimator, it is possible to obtain asymptotically optimal advertising decisions despite the presence of latent agents.

Keywords

Network analysis

Speaker

Alexandre Belloni, Duke University

Instrumental variable value iteration for causal offline reinforcement learning

In offline reinforcement learning (RL) an optimal policy is learnt solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction (CMR) through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of CMR. To the best of our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.

Keywords

offline reinforcement learning

Speaker

Mladen Kolar

Generalized Tensor Completion for Noisy Data with Non-Random Missingness

Tensor completion plays a crucial role in a wide range of applications, including recommender systems and medical imaging, where observed data are often highly incomplete. While extensive prior work has addressed tensor completion with data missingness, most assume that missing entries occur randomly. However, real-world data often exhibit missing-not-at-random patterns, where missingness depends on the underlying tensor values. This paper introduces a generalized tensor completion framework for noisy data with non-random missingness, where the missing probability is modeled as a function of underlying tensor values. Our formulation is flexible and accommodates various tensor data types, including continuous, binary, and count data. For model estimation, we develop a computationally efficient alternating gradient descent algorithm and derive non-asymptotic error bounds for the estimator at each iteration. Additionally, we propose a statistical inferential procedure to test whether missing probabilities depend on tensor values, offering a formal assessment of the missing-at-random assumption within our modeling framework. The utility and efficacy of our approach are demonstrated through comparative simulation studies and analyses of two real-world datasets.

Keywords

graphical model with covariates

multi-task learning

debiased inference

Speaker

Emma Jingfei Zhang, Emory University

Doubly robust alignment for large language models

This paper studies reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences. While RLHF has demonstrated promising results, many algorithms are highly sensitive to misspecifications in the underlying preference model (e.g., the Bradley-Terry model), the reference policy, or the reward function, resulting in undesirable fine-tuning. To address model misspecification, we propose a doubly robust preference optimization algorithm that remains consistent when either the preference model or the reference policy is correctly specified (without requiring both). Our proposal demonstrates superior and more robust performance than stateof-the-art algorithms, both in theory and in practice. The code is available at https: //github.com/DRPO4LLM/DRPO4LLM

Keywords

experimental design

Speaker

Chengchun Shi