Sunday, Aug 3: 4:00 PM - 5:50 PM
4018
Contributed Papers
Music City Center
Room: CC-201B
Main Sponsor
ENAR
Presentations
Variation in laboratory assays can contribute to measurement error. Careful planning can minimize differential errors in effect measures. Randomization can help to ensure that sequencing of samples within and across batches is independent of sample characteristics. Batches may be comprised of multiple plates. We developed an algorithm to assign samples to batches that: 1) allows for variation in plate sizes within batches; 2) treats samples for matched study subjects such as cases and controls or exposed and unexposed individuals as a set; 3) randomizes sets to assigned batches; and, 4) orders samples randomly with sets. To evaluate variation within and between batches, quality control samples are: 1) assigned both within and to other batches; and, 2) quality control replicate samples in the same batch are required to be placed a certain distance apart. An option in the tool allows for minimal rearrangement of samples if a sequence of assays requiring different batch sizes are being conducted. A validation step verifies that algorithm arguments are being met. A R-Package with this tool is being developed, including a vignette and a test dataset.
Keywords
randomization
case-control
laboratory assays
bias
Disclaimer:
The authors have no conflicts of interest to disclose. The views, information or content, and conclusions presented do not necessarily represent the
official endorsement be inferred on the part of, the Uniformed Services University, the Department of Defense, the U.S. Government or The Henry M. Jackson Foundation.
In ASPIRE, a cluster-randomized trial, pediatric primary care clinics receive either facilitation or no facilitation for delivering a secure firearm program. Under this program, clinicians provide both counseling and free gun locks to parents. Randomization should enable non-parametric estimation of the ATE, but clinicians document their own delivery of the program, which may not reflect true delivery. In a follow-up study to address this classification error, parents are asked to validate clinicians' documentation, but only a fraction volunteer. In this setting where a non-random internal validation set is available, we demonstrate that it is possible to use the relationship between gold-standard (parent) and silver-standard (clinician) measures to target the ATE without bias. Moreover, we show that our method is valid even when selection into the validation sample depends on the true outcome. Simulation studies demonstrate acceptable finite sample performance of our estimators with cluster-robust variance expressions in the presence of misclassification and selection bias in the validation set. We apply our methods to ASPIRE to assess the impact of facilitation on program delivery.
Keywords
cluster-randomized trial
measurement error
selection bias
causal inference
Change point detection (CPD) is essential in identifying structural shifts in time-series data, with applications spanning finance, healthcare, and environmental monitoring. Traditional CPD methods often assume normality, which fails to capture real-world data that exhibit skewness and heavy tails. This talk explores using skew-t distributions in CPD, providing a more robust framework for detecting distributional shifts.
We introduce parametric and non-parametric CPD approaches, emphasizing a Bayesian Information Criterion (BIC)-based method tailored for skewed data. Applications include changes in financial market regimes, environmental monitoring of heavy metal contamination, and healthcare analytics such as glaucoma progression modeling. Additionally, we highlight the integration of CPD in machine learning and AI, including concept drift detection, anomaly detection, and reinforcement learning.
By leveraging skew-t distributions, we enhance the accuracy of CPD models in capturing asymmetric and long-tailed data, offering more reliable insights across disciplines.
Keywords
Change Point Detection, Skew-T Distribution, Bayesian Information Criterion, Machine Learning, AI Model Adaptation, Concept Drift, and Anomaly Detection.
As solar energy continues to grow as a key component of the global energy mix, accurate forecasting of solar irradiance becomes more crucial for ensuring reliable electricity supply. However, existing forecasting methods often fail to capture the fine temporal variations in solar irradiance, particularly in regions where local weather conditions play a significant role. This research addresses the growing need for accurate solar irradiance forecasting to optimize the integration of solar energy into the grid. By using raw data, we aim to preserve important short-term fluctuations that are crucial for precise forecasts. The focus was on downscaling global solar irradiance data from a 15-min. resolution to a higher, 5-min. local resolution for Brookings, South Dakota. A transformer-based model was applied to forecast solar power output, utilizing different approaches to assess the effectiveness of various downscaling methods. The model was trained on historical data and used to generate short-term forecast for 24 hours, with performance evaluated based on standard error metrics. The findings highlight the potential of transformer models for improving solar irradiance forecast.
Keywords
Solar Irradiance Forecast
Downscaling
Transformers
Time Series
Meta-analytic methods tend to take all-or-nothing approaches to study-level heterogeneity, either limiting the influence of studies that are suspected to diverge from a shared model or assuming all studies are homogeneous. In this paper, we develop a heterogeneity-adaptive meta-analysis in linear models that adapts to the amount of information shared between datasets. The primary mechanism for the information-sharing is a shrinkage of dataset-specific distributions towards a new "centroid" distribution through a Kullback-Leibler divergence penalty. The Kullback-Leibler divergence is uniquely geometrically suited for measuring relative information between datasets. We establish our estimator's desirable inferential properties without assuming homogeneity between dataset parameters. Among other things, we show that our estimator has a provably smaller mean squared error than the dataset-specific maximum likelihood estimators, and establish asymptotically valid inference procedures. A comprehensive set of simulations illustrates our estimator's versatility, and an analysis of data from the eICU Collaborative Research Database illustrates its performance in a real-world setting.
Keywords
Data integration
Penalized regression
Information geometry
Stein shrinkage
Data privacy
In observational studies, empirical calibration of p-values using negative control outcomes (NCOs) has emerged as a powerful tool for detecting and adjusting for systematic bias in treatment effect estimation. However, existing methods assume that all NCOs are valid-i.e., they have a true null effect-an assumption often violated in real-world settings. This study introduces a mixture model-based approach to account for the presence of invalid NCOs. Our method estimates the null distribution of effect estimates while accommodating heterogeneous NCO validity, enhancing robustness against bias. Through simulation studies, we demonstrate that our approach improves bias correction and controls false discoveries. We apply this methodology to real-world healthcare datasets, showcasing its practical benefits in ensuring reliable causal inference. Our findings underscore the importance of flexible p-value calibration strategies in observational research, particularly when some NCOs may deviate from the true null hypothesis. By tolerating partial misclassification of NCOs, our approach advances empirical calibration toward greater robustness and generalizability.
Keywords
Hypothesis Testing
Mixture Models
Negative Control Outcomes
Observational Studies
p-value Calibration
Co-Author(s)
Dazheng Zhang
Huiyuan Wang, University of Pennsylvania
Wenjie Hu, University of Pennsylvania\ School of Medicine - Philadelphia, PA
Qiong Wu, University of Pittsburgh
Howard Chan, University of Pennsylvania
Lu Li
Patrick Ryan, Johnson & Johnson
Marc Suchard, University of California-Los Angeles
Martijn Schuemie, Observational Health Data Science and Informatics
George Hripcsak, University of Columbia
Yong Chen, University of Pennsylvania, Perelman School of Medicine
First Author
Bingyu Zhang
Presenting Author
Bingyu Zhang
In a hearing clinical trial comparing Tinnitus patients to a control group, noise exposure was recorded every 3.75 minutes over 7 days. Tinnitus is the perception of sound in the ears or head without an external source. We present an application-driven approach for time series denoising and group comparisons in analyzing sound exposure patterns between the two groups. Instead of traditional two-sample comparison methods, functional data analysis (FDA) was employed. Noise exposure sequences were decomposed into group-specific mean and residual functions, preserving both group-level trends and individual variations. This FDA-based denoising procedure reduced random fluctuations, enhancing the detection of systematic group differences. For statistical inference, a basis function-based simultaneous confidence band was constructed using the denoised sequences. Simultaneous confidence band results closely aligned with the pointwise Wilcoxon test adjusted by the B-H procedure, revealing the most pronounced differences in different times of the day. This approach demonstrates the effectiveness of functional data analysis in time series denoising and structured group comparisons.
Keywords
Tinnitus
Functional Data Analysis
Clinical Research
Time Series
Intensive Longitudinal Data