Semiparametric Correlation Estimation in Multivariate BWAS

Ishaan Gadiyar Co-Author
Vanderbilt University Medical Center
 
Xinyu Zhang Co-Author
 
Kaidi Kang Co-Author
Vanderbilt University
 
Edward Kennedy Co-Author
 
Aaron Alexander-Bloch Co-Author
Department of Psychiatry, University of Pennsylvania
 
Jakob Seidlitz Co-Author
Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA
 
Simon Vandekar Co-Author
Vanderbilt University
 
Megan Jones First Author
 
Megan Jones Presenting Author
 
Monday, Aug 4: 9:50 AM - 10:05 AM
2580 
Contributed Papers 
Music City Center 
Multivariate brain-wide association studies (BWAS) use machine learning (ML) models to predict phenotypes from high-dimensional brain imaging. For continuous predicted features, Pearson's correlation between the predicted feature and actual feature is often used to quantify model accuracy in test data; however, the parameter this is meant to estimate is not explicit. We rigorously define multiple parameters and show that the standard Pearson estimator is biased for the typical parameter of interest in multivariate BWAS studies. Using flexible ML models affects the rate of convergence to the true parameter, and the sample size needed to converge is often larger than existing neuroimaging datasets. Additionally, the typical Fisher confidence intervals for Pearson's correlation undercover. We use semiparametric theory to present a new estimator based on the efficient influence function of the target parameter. This estimator converges to the parameter in reasonable sample sizes and admits a confidence interval procedure that achieves nominal or near-nominal coverage. We show how researchers can provide estimates, confidence intervals, and p-values (without the need for permutation testing) for model accuracy.

Keywords

machine learning

brain-wide association studies

correlation

semiparametric

prediction

neuroimaging 

Main Sponsor

Section on Statistics in Imaging