Selective Inference for Multivariate Regression Trees

Karl Gregory Co-Author
Academic Advisor
 
Le Chang First Author
 
Le Chang Presenting Author
 
Tuesday, Aug 5: 12:05 PM - 12:20 PM
2670 
Contributed Papers 
Music City Center 
We consider post-selection inference for regression trees when the response is multivariate. In particular, we study how to appropriately test hypotheses suggested by the fitted tree. We find, as is known when the response is univariate, that to control the Type I error rate one must condition on the recursive data splits leading to the hypothesis in question. One may wish, e.g., to test whether the populations represented by two sibling nodes have the same mean. With a univariate response, proper conditioning on the splits results in a truncation of the null distribution of the test statistic such that p-values must be computed with respect to truncated normal distributions. With a multivariate response, we find that the p-values must be computed with respect to truncated multivariate normal distributions, where the truncation set is defined by a list of quadratic constraints. We show that accept-reject Monte Carlo simulation can give reliable post-selection p-values with a bivariate response and a fairly small number of predictors. To accommodate more predictors, we must consider more efficient ways to obtain probabilities from truncated multivariate Normal distributions.

Keywords

post-selection inference

regression tree

MCMC 

Main Sponsor

Statistics Without Borders