Tuesday, Aug 5: 10:30 AM - 12:20 PM
4100
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Section on Nonparametric Statistics
Presentations
Longitudinal network data reflects the dynamic evolution of network structures and attributes over time, offering a unique opportunity to explore temporal dynamics, uncovering trends, and identifying the mechanisms driving network evolution. These insights are particularly valuable in areas such as social networks, biological systems, communication networks, and neuroscience/neurology. In this study, we introduce a novel non-parametric hypothesis-testing method specifically tailored for longitudinal network data on spherical surface. The proposed method begins with the construction of a network distance matrix on manifold, and accounts for the impact of serial correlation across multiple time points, ensuring temporal dependencies are appropriately addressed. Experiments on both synthetic and real-world data demonstrate that the proposed method effectively controls type I errors while maintaining robust statistical power to detect group or time effects and their interactions in network data.
Keywords
Longitudinal Network Analysis
Distance-based Repeated Measures MANOVA
Manifold Learning
Human trafficking is a critical issue, with online advertisements serving as proxies for illicit activity within the trafficking network. Law enforcement works diligently to disrupt the networks, but long-term effectiveness of arrests on reducing online advertisements is unclear. Existing research highlights immediate impacts of law enforcement intervention but lacks consensus on sustained reductions. This study explores the relationship between arrests and fluctuations in trafficking ads using data from five cities. By analyzing time-series data, we investigate whether arrests trigger significant changes in ad volumes and identify potential changepoints associated with enforcement activity. Descriptive statistics reveal short-term declines in ad activity following arrests, though long-term patterns are ambiguous. Applying a nonparametric changepoint model, we observe short-term decreases but limited evidence for lasting impact on ad activity. These findings suggest that while arrests disrupt trafficking activity, they may not produce sustained reductions. This research emphasizes the importance of date-informed strategies and coordinated interventions to combat human trafficking.
Keywords
Nonparametric Changepoint Model
Time Series Analysis
Law Enforcement Impact
Human Trafficking
In this work, we proposed a novel inferential procedure assisted by machine learning based adjustment for randomized control trials. The method was developed under the Rosenbaum's framework of exact tests in randomized experiments with covariate adjustments. Through extensive simulation experiments, we showed the proposed method can robustly control the type I error and can boost the statistical efficiency for a randomized controlled trial (RCT). This advantage was further demonstrated in a real-world example. The simplicity, flexibility, and robustness of the proposed method makes it a competitive candidate as a routine inference procedure for RCTs, especially when nonlinear association or interaction among covariates is expected. Its application may remarkably reduce the required sample size and cost of RCTs, such as phase III clinical trials.
Keywords
Machine learning
Randomized controlled trial
Exact inference
Co-Author(s)
Alan Hutson, Roswell Park Cancer Institute
Xiaoyi Ma, Roswell Park Comprehensive Cancer Center
First Author
Han Yu, Roswell Park Comprehensive Cancer Center
Presenting Author
Han Yu, Roswell Park Comprehensive Cancer Center
The within-between model is a robust approach that addresses the constraints inherent in both fixed effects and random effects models by distinctly modeling within-group and between-group effects. This paper introduces a nonparametric extension of the Within-Between model for the analysis of hierarchical data using Bayesian Additive Regression Trees. Our extension permits flexible nonlinear relationships while preserving the interpretability benefits of the linear Within-Between framework. We establish theoretical guarantees on posterior concentration rates under appropriate conditions and present a framework for deriving interpretable summaries of the intricate nonparametric effects using surrogate models. Through simulation studies, we demonstrate the superior performance of our approach compared to existing methods, including linear fixed effects, random effects, and standard BART extensions, particularly when the true relationships are nonlinear. We illustrate the practical applicability of our method through its application to the National Education Longitudinal Study, wherein we analyze student dropout status while accounting for both student-level and school-level effects.
Keywords
BART
Multilevel Modelling
Within-Between Model
Nonparametric Regression
Diabetes is a leading chronic condition that affects the regulatory glucose mechanism. Preventive care, such as physical activity, is essential to reduce the risk of diabetes onset. The All-of-US Research Program, launched by the NIH, records the daily active zone minutes of over 15620 diverse participants across time. We conducted a retrospective study on All-of-US participants with data collected before the outbreak of COVID-19 in March 2020 when physical activity patterns began to shift. This project assessed the functional association of long-term physical activity on the risk of diabetes onset, using the logistic regression with time-varying effects of daily activity durations. Individuals' long-term activity duration curves and effect curves are decomposed by shared orthonormal basis functions. We adopt fused lasso to cluster individuals based on their latent projection features. Participants in the same subgroup share characteristic activity duration curves and functional effects of long-term physical activity. The subgroup functional effects are estimated through the alternating direction methods of multiplier (ADMM). The details of the data analysis results are presented.
Keywords
Functional effects
Subgroup analysis
Time-varying effects
All-of-US research program
Fitbit
We used a novel shape-restricted Cox model to determine the desirable ER expression cutoff to predict breast cancer prognoses. Our model treats ER as a continuous variable using a flexible monotone-shaped Cox regression to assess its association with survival outcomes holistically. The study included 3055 patients with stage II/III HER2-negative breast cancer. The primary outcomes were time to recurrence or death (TTR) and overall survival (OS). The shape-restricted Cox model identified 10% ER as the preferred cutoff to predict TTR. The finding was confirmed by the log-rank test and standard Cox model that patients with ER ≥ 10% had TTR benefit over ER < 10% (log-rank p < 0.001). No OS or TTR benefit of adjuvant endocrine therapy was observed in patients with 1% ≤ ER < 10% (HR 0.877, 95% CI 0.481 – 1.600, p = 0.668 for TTR and HR 0.698, 95% CI 0.337 – 1.446, p = 0.333 for OS). Using the shape-restricted Cox model, this study suggests a potential preferred threshold of 10% for predicting TTR, assisting physicians in effectively weighing the benefits and risks of adjuvant endocrine therapy for patients with ER < 10% disease, particularly in cases with severe adverse events.
Keywords
Estrogen receptor
Threshold
Survival
Modelling
Endocrine therapy
Breast cancer
Co-Author(s)
Takeo Fujii, Center for Cancer Research, National Cancer Institute
Jing Ning, University of Texas, MD Anderson Cancer Center
Toshiaki Iwase, University of Hawaiʻi Cancer Center,
Jing Qin, National Institute of Allergy and Infectious Diseases, NIH
Naoto Ueno, University of Hawaiʻi Cancer Center
Yu Shen, UT M.D. Anderson Cancer Center
First Author
Wenli Dong, UT MD Anderson Cancer Center
Presenting Author
Wenli Dong, UT MD Anderson Cancer Center
This paper deals with simultaneously testing whether k count variables, observed from independent samples, each have geometric laws, where the parameters of these k geometric laws may be different. A test statistic is proposed and shown to be asymptotically distribution free under the null hypothesis, where asymptotic means k→∞. For moderate values of k, this asymptotic null distribution yields conservative tests and so a bootstrap procedure is suggested to approximate the null distribution. Furthermore, this approximation is shown to be consistent. The asymptotic power of the test is also derived, allowing us to determine the alternatives that the new procedure is able to detect. The finite sample performance of the proposal is studied via numerical simulation methods. The test is also applied to the 2024 PGA golf Championship data set. Finally, we observe that the proposed procedure can be imitated to build tests for goodness-of-fit of other distributions in multi-sample settings.
Keywords
goodness-of-fit
count data
many samples
bootstrap
consistency
asymptotic power
Due to their parsimony, separable covariance models have been popular in modeling matrix-variate data. However, the inference from such a model may be misleading if the population covariance matrix is actually not separable. This suggests the use of statistical tests of covariance separability. Likelihood ratio tests have tractable null distributions and good power when the sample size $n$ is not less than the number of variables $p$, but are not well-defined otherwise. Other existing separability tests for the $p>n$ case have low power for small sample sizes, and have null distributions that depend on unknown parameters, preventing exact error rate control. To address these issues, we propose novel invariant tests leveraging the core covariance matrix, a complementary notion to a separable covariance matrix. We show that testing separability of a covariance matrix is equivalent to testing sphericity of its core component. Based on this observation, we construct test statistics that are well-defined in high-dimensional settings and have distributions that are invariant under the null hypothesis of separability, allowing for exact simulation of null distributions. We study asymptotic null distributions and show consistency of our tests in a $p/n\rightarrow(0,\infty)$ asymptotic regime. Via simulation studies, we illustrate the large power of our proposed tests as compared to existing procedures.
Keywords
Core covariance matrix
eigenvalues
hypothesis testing
invariance
separable covariance matrix
separable covariance expansion