A computationally efficient TreeScan implementation via pruning

Shirley Wang Co-Author
 
Georg Hahn First Author
 
Georg Hahn Presenting Author
 
Wednesday, Aug 6: 10:35 AM - 10:50 AM
1811 
Contributed Papers 
Music City Center 
Tree-based scan statistics (TBSSs) are popular methods to conduct disproportionality analyses with many hierarchically related outcomes, allowing one to search for potential increased risks of drugs and vaccines among thousands of hierarchically related outcomes. Using the Dvoretzky-Kiefer-Wolfowitz inequality, we compute statistically valid bounds on the p-values calculated by TBSS, and we use those bounds in a two-fold manner. First, we quickly estimate the number of signals in the data using the fact that for non-significant nodes, p-value lower bounds usually indicate a departure from the significance threshold early on in the TBSS run. Second, we prune non-significant nodes, thereby reducing the size of the tree and speeding up the computation. Using a real data example of clinical relevance (risk assessment of SGLT2 and GLP1 inhibitors via hierarchical testing given by the International Classification of Diseases, ICD-10), we demonstrate that pruning allows one to considerably reduce the computational effort of TBSS while discovering the same signals.

Keywords

hypothesis testing

Treescan

hierarchical outcomes

International Classification of Diseases

TBSS

risks of drugs 

Main Sponsor

Section on Statistical Computing