Print Close

WITHDRAWN - Learning Counterfactual Distributions via Kernel Nearest Neighbors

Presented During: Recent Advances in Machine Learning and Data Science

Jacob Feitelberg Co-Author
Columbia University

Caleb Chin Co-Author
Cornell University

Anish Agarwal Co-Author
Columbia University

Raaz Dwivedi Co-Author
UC Berkeley

Kyuseong Choi First Author

Sunday, Aug 3: 5:05 PM - 5:20 PM
1485
Contributed Papers

Music City Center

Consider a setting with multiple units (e.g., individuals, cohorts, geographic locations) and outcomes (e.g., treatments, times, items), where the goal is to learn a multivariate distribution for each unit-outcome entry, such as the distribution of a user's weekly spend and engagement under a specific mobile app version. A common challenge is the prevalence of missing not at random data where the missingness can be correlated with properties of distributions themselves, i.e., there is unobserved confounding. An additional challenge is that for any observed unit-outcome entry, we only have a finite number of samples from the underlying distribution. We tackle these two challenges by casting the problem into a novel distributional matrix completion framework and introduce a kernel-based distributional generalization of nearest neighbors to estimate the underlying distributions. By leveraging maximum mean discrepancies and a suitable factor model on the kernel mean embeddings of the underlying distributions, we establish consistent recovery of the underlying distributions even when data is missing not at random and positivity constraints are violated.

Keywords

Distribution recovery

Kernel methods

Missing-not-at-random

Nearest neighbors

Mean embedding factor model

Main Sponsor

IMS