Bridge Sample Selection and Batch Correction in High-Dimensional Data

Rondi A Butler Co-Author
Brown University School of Public Heath, RI, USA
 
Lucas A Salas Co-Author
Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
 
Brock C Christensen Co-Author
Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
 
Karl T Kelsey Co-Author
Brown University School of Public Health, Providence, RI, USA
 
Devin C Koestler Co-Author
University of Kansas Medical Center, Kansas City, KS, USA
 
Md Saiful Islam Saif First Author
 
Md Saiful Islam Saif Presenting Author
 
Thursday, Aug 7: 12:05 PM - 12:20 PM
2450 
Contributed Papers 
Music City Center 
Batch effects, or technical variation across experimental batches, can obscure true biological signals and represent a significant challenge in the analysis and interpretation of 'omics data. This is particularly true of proteomics data generated using Olinkļƒ’ Target technology. To mitigate batch effects, Olink recommends using bridge samples, however, it is unclear whether bridge samples are always necessary. Furthermore, if bridge samples are needed, key questions arise regarding the appropriate number of bridge samples, as well as the strategy for the specific selection of bridge samples. To shed light on these questions, we conducted a systematic evaluation of three batch correction approaches including Olink's bridge sample method, COMBAT, and a case-control confounded approach (Remeasure) across three different study designs: (1) cases and controls processed in separate batches, (2) cases and controls mixed within each batch, and (3) cases distributed across multiple batches. Using simulations that closely reflect real-world datasets, we assessed the impact of batch correction methods on statistical power, Type I error, and false discovery rate. Our results provide guidance on which correction method performs best under different scenarios and the optimal number of bridge samples needed for effective correction. We further validated our findings by applying these methods to a real dataset. While our study focuses on batch correction within Olink proteomics data, the methodologies and insights presented here may be applicable to other high-dimensional omics datasets facing similar challenges.

Keywords

Batch Effect

Bridge Sample Selection

OlinkAnalyze

COMBAT

High-Dimensional Data

Olink Proteomics 

Main Sponsor

Section on Statistics in Genomics and Genetics