Monday, Aug 4: 2:00 PM - 3:50 PM
0816
Topic-Contributed Paper Session
Music City Center
Room: CC-207D
Applied
Yes
Main Sponsor
WNAR
Co Sponsors
ENAR
Section on Statistics in Genomics and Genetics
Presentations
Logic rules combining OR-AND threshold operations are widely used in cancer early detection due to their simplicity and interpretability. These rules are part of efforts to develop biomarker panels that optimize performance under constraints like maintaining high specificity while maximizing sensitivity. However, traditional approaches like classification trees (CART) often fail to meet such constraints despite their predictive accuracy. We propose a novel method using decision lists—sequential if-then rules that map covariates to outcomes—to develop parsimonious combinatory threshold rules for biomarkers. This method allows for sequential biomarker measurement, reducing the need for further tests if initial conditions are met, thus decreasing patient and specimen burden. Our simulations and application to pancreatic cancer data demonstrate the method's superiority in maintaining constrained optimization over comparative approaches.
Keywords
Logic rule
decision list
cancer biomarker
constrained optimization
classification
Large-scale gene expression studies allow gene network construction to uncover associations among genes. To study direct associations among genes, networks based on partial correlations are preferred over those on marginal correlations. However, FDR control for partial correlation-based network construction is not well-studied. In addition, currently available partial correlation-based methods cannot take existing biological knowledge to help network construction while controlling FDR. In this talk, we propose a method called Partial Correlation Graph with Information Incorporation (PCGII). PCGII estimates partial correlations between each pair of genes by regularized node-wise regression that can incorporate prior knowledge while controlling the effects of all other genes. It handles high-dimensional data where the number of genes can be much larger than the sample size and controls FDR at the same time. We compare PCGII with several existing approaches through extensive simulation studies and demonstrate that PCGII has better FDR control and higher power. We apply PCGII to a plant gene expression dataset where it recovers confirmed regulatory relationships and a hub node, as well as several direct associations that shed light on potential functional relationships in the system. We also introduce a method to supplement observed data with a pseudogene to apply PCGII when no prior information is available, which also allows checking FDR control and power for real data analysis.
Keywords
gene network
partial correlation
graphical models
PCGII
FDR control
conditional association
Transcriptome-wide association studies (TWAS) are valuable tools for identifying gene-level associations by integrating genome-wide association studies with gene expression data. However, most TWAS methods focus solely on linear associations between genes and traits, overlooking the complex nonlinear relationships that often exist in biological systems. To address this limitation, we propose a novel framework called QTWAS, which incorporates a quantile-based gene expression model into the TWAS framework. This approach allows for the detection of both nonlinear and heterogeneous gene-trait associations. Through extensive simulations and applications to both continuous and binary traits, we demonstrate that QTWAS is more powerful than conventional TWAS methods in identifying gene-trait associations.
Keywords
Quantile regression
TWAS
nonlinear genetic associations
The advancement of spatial transcriptomics (ST) technology has revolutionized the ability to profile gene expression while retaining spatial information, offering unprecedented potential to unravel the complexities of cellular function and structure. Cell-cell interactions play a pivotal role in shaping biological processes, and the evolution of ST technologies has led to the development of numerous tools for their analysis. However, existing methods often struggle with accurately quantifying interactions due to the mixture of cell types within tissue samples. Additionally, these methods frequently overlook or encounter challenges in detecting interactions involving rare cell types, which can play crucial roles in disease onset and progression. In this work, we propose a novel method that addresses these limitations by incorporating the complexity of cellular composition into a spatially aware framework for cell-cell interaction quantification, designed specifically for widely used ST platforms such as 10x Visium ST. Our framework leverages information from neighboring spots to robustly quantify both local and global interaction patterns while providing tailored approaches for detecting interactions involving rare cell types. The effectiveness of the proposed method is demonstrated through real data-based simulation studies and its application to spatial transcriptomics datasets from pancreatic cancer patients.
Co-Author
Ziyi Li, MD Anderson Cancer Center
Speaker
Ziyi Li, MD Anderson Cancer Center
Survival risk prediction is an important task in clinical cancer research. By its virtue of simultaneously measuring the transcription of thousands of markers, transcriptomic sequencing holds the potential for predicting survival risk based on patients' transcriptomic profiles. Like many high-throughput platforms, transcriptomic sequencing suffers from the ubiquitous presence of batch effects. We previously developed BATch MitigAtion via stratificatioN (BatMan) method to adjust for batch effects in transcriptomic microarray data. In this study, we extend BatMan to sequencing data. The discrete nature of the sequencing count data and the presence of sequencing depth variation make it challenging to simulate batch effects. We use a Gamma-Poisson model to introduce batch effects to expression data and extensively assess the performance of BatMan in comparison with ComBat-Seq and sequencing depth normalization methods. We found that BatMan outperforms Combat-Seq in all simulation scenarios. We applied BatMan to a dataset from sarcoma patients at Memorial Sloan Kettering Cancer Center to demonstrate its performance in survival risk prediction with real-world data.