Chain-linked multiple matrix integration via embedding alignment

Minh Tang Co-Author
North Carolina State University
 
Runbing Zheng First Author
Johns Hopkins University
 
Runbing Zheng Presenting Author
Johns Hopkins University
 
Thursday, Aug 7: 10:35 AM - 10:50 AM
0872 
Contributed Papers 
Music City Center 
Motivated by the increasing demand for multi-source data integration in various scientific fields, in this paper we study matrix completion in scenarios where the data exhibits certain block-wise missing structures -- specifically, where only a few noisy submatrices representing (overlapping) parts of the full matrix are available. We propose the Chain-linked Multiple Matrix Integration (CMMI) procedure to efficiently combine the information that can be extracted from these individual noisy submatrices. CMMI begins by deriving entity low-rank embeddings for each observed submatrix, then aligns these embeddings using overlapping entities between pairs of submatrices, and finally aggregates them to reconstruct the entire matrix of interest. We establish, under mild regularity conditions, entrywise error bounds and normal approximations for the CMMI estimates. Simulation studies and real data applications show that CMMI is computationally efficient and effective in recovering the full matrix, even when overlaps between the observed submatrices are minimal.

Keywords

2→∞ norm

normal approximations

matrix completion

data integration 

Main Sponsor

Section on Statistical Learning and Data Science