Heterogeneous transfer learning for high dimensional regression with feature mismatch

Massimiliano Russo Co-Author
The Ohio State University
 
Subhadeep Paul Co-Author
The Ohio State University
 
Jae Ho Chang First Author
The Ohio State University
 
Jae Ho Chang Presenting Author
The Ohio State University
 
Tuesday, Aug 5: 11:20 AM - 11:35 AM
1887 
Contributed Papers 
Music City Center 

Description

We study transferring knowledge from a source domain to a target domain to learn a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However, most homogeneous transfer and multi-task learnings assume that the target and proxy domains have the same feature space. However, target and proxy feature spaces are often not fully matched due to the inability to measure some variables in the target data-poor environments. Existing heterogeneous transfer learning methods do not provide statistical error guarantees. We propose a two-stage method that learns the relationship between the missing and observed features through a projection step and then solves a joint penalized optimization problem. We develop upper bounds on the method's parameter estimation and prediction risks, assuming that the proxy and the target domain parameters are sparsely different. Our results elucidate how estimation and prediction error depend on the complexity of the model, sample size, the extent of overlap, and correlation between matched and mismatched features.

Keywords

Transfer Learning

Data integration

Trustworthy AI

Feature Mismatch

Penalized regression 

Main Sponsor

Section on Statistical Learning and Data Science