Data with Structured Correlation: Network, Spatial, and Temporal Modeling

Trijya Singh Chair
 
Thursday, Aug 7: 10:30 AM - 12:20 PM
4220 
Contributed Papers 
Music City Center 
Room: CC-Davidson Ballroom A3 

Main Sponsor

Royal Statistical Society

Presentations

A Bayesian approach to model uncertainty in unsupervised learning from single-cell genomic data

Network models provide a powerful framework for analysing single-cell count data, facilitating the characterisation of cellular identities, disease mechanisms, and developmental trajectories. However, uncertainty modeling in unsupervised learning with genomic data remains insufficiently explored. Conventional clustering methods assign a singular identity to each cell, potentially obscuring transitional states during differentiation or transformation. This study introduces a variational Bayesian framework for clustering and analysing single-cell genomic data, employing a Bayesian Gaussian mixture model to estimate the probabilistic association of cells with distinct clusters. This approach captures cellular transitions, yielding biologically coherent insights into neurogenesis and breast cancer progression. The inferred clustering probabilities enable further analyses, including Differential Expression Analysis, Gene Set Enrichment Analysis, and pseudotime analysis. Furthermore, we develop a novel quantitative measure to validate unsupervised learning with scRNA-seq data, reflecting a more authentic correspondence between clustering outcomes and marker genes. This methodological advancement enhances the resolution of single-cell data analysis, enabling a more nuanced characterisation of dynamic cellular identities in development and disease. 

Keywords

Unsupervised learning

Variational Bayesian Estimation of a Gaussian Mixture

Pseudotime analysis

Dimensionality reduction

Single-cell genomics

Embryo cortical development and Breast cancer progression 

Co-Author(s)

Thomas E. Bartlett, University College London
Lina Gerontogianni, The Francis Crick Institute
Swati Chandna, Birkbeck, University of London

First Author

Shanshan Ren, University College London

Presenting Author

Shanshan Ren, University College London

Broken-Trends in City Prices: Detecting Breaks and Forecasting Aggregated Inflation

The importance of forecasting inflation as well as possible cannot be overstated, especially in a country that follows the inflation targeting strategy. With that in mind, this work takes the idea of aggregate regional price indices in 23 Colombian cities to forecast national inflation. This paper employs individual price index time series models to identify price level changes based on city series. Using trend models that incorporate these breaks, we forecast monthly and annual total inflation. Our results show that including trend breaks and disaggregated information improves the accuracy of annual inflation prediction across many time horizons and competes with the item-by-item aggregated forecast exercises. We obtained gains in RMSFE of around 13% and 45%, for one and two months ahead, relative to an aggregated ARIMA model. The forecasts for the end of 2025 are close to 4.5%.  

Keywords

Consumer Price Indexes

Linear Trend Models

Structural Breaks

Forecasting

Regional Forecast Aggregation 

Co-Author

Héctor Manuel Zarate Solano, Banco de la República

First Author

Norberto Rodríguez-Niño, Banco de la Republica

Presenting Author

Norberto Rodríguez-Niño, Banco de la Republica

Simultaneous global and local clustering in multiplex networks with covariate information.

We introduce a new model that simultaneously detects communities within individual layers of a multiplex network while inferring a global node clustering across the layers. A Stochastic Block Model (SBM) is assumed in each layer, with probabilities of layer-level group memberships determined by a node's global group assignment. Our model uses a Bayesian framework, employing a probit stick-breaking process to construct node-specific mixing proportions over a set of shared Griffiths-Engen-McCloseky (GEM) distributions. These proportions determine layer-level community assignment, allowing for an unknown and varying number of groups across layers, while incorporating nodal covariate information to inform the global clustering. We propose a scalable variational procedure with parallelisable updates for application to large networks. Extensive simulation studies demonstrate our model's ability to accurately recover both global and layer-level clusters in complicated settings, and applications to real data showcase the model's effectiveness in uncovering interesting latent network structure. 

Keywords

Multiplex networks

Community detection

Dirichlet process

Stochastic block model

Variational inference 

Co-Author(s)

Edward Cohen, Imperial College London
Francesco Sanna Passino, Imperial College London
James Martin, Imperial College London
Lekha Patel
Kurtis Shuler, Sandia National Laboratories

First Author

Joshua Corneck, Imperial College London

Presenting Author

Joshua Corneck, Imperial College London

Composite Transportation Divergence and Finite Mixture Models

When statistical data is large and distributed across multiple locations, initial estimates of the population distribution are often computed locally and then aggregated centrally. For parametric models, simple averaging typically ensures optimal convergence rates. However, in finite mixture models, where the parameter space is non-Euclidean, aggregation requires more refined methods due to computational and statistical challenges.

To address these issues, we propose using composite transportation divergence to aggregate mixture distributions, yielding an estimator that is optimal under the defined criteria. We develop an MM algorithm that guarantees convergence to at least a local optimum in a finite number of iterations. Our approach also applies to Gaussian mixture reduction, approximating a high-order mixture with a lower-order one. Under slightly stronger assumptions, the aggregated estimator retains its optimal convergence rate and can be made tolerant to Byzantine failures. 

Keywords

Composite transportation distance

distributed learning

finite mixture model

mixture reduction

MM alrogithm 

Co-Author

Qiong Zhang, Renmin University of China

First Author

Jiahua Chen, University of British Columbia

Presenting Author

Jiahua Chen, University of British Columbia

Interpretable and Efficient Brain Image Analysis: Addressing CNN Black Box Challenge

Brain image analysis is a rapidly advancing field, yet accurately identifying Regions of Interest (ROIs) remains challenging due to the limitations of traditional methods in precision, efficiency, and interpretability. While neural networks effectively handle large datasets and learn complex features, they often demand high computational resources, lengthy training times, and lack transparency.

To overcome these challenges, we propose an innovative method that enhances ROI identification accuracy and interpretability while improving computational efficiency. Our approach integrates classification-based uncertainty estimation and probability-driven techniques, employing adaptive sampling via Shannon entropy and a mean-based probability framework. Block kriging and statistical inference further enable efficient and precise hotspot detection, significantly reducing training time without sacrificing performance.

The proposed method integrates seamlessly with Convolutional Neural Networks (CNNs), offering accurate hotspot detection with reduced computational complexity. A subset of the Traumatic Brain Injury (TRACK-TBI) study dataset is analyzed to demonstrate its effectiveness. 

Keywords

Region of Interst(ROI)

Convolutional Neural Networks (CNNs)

Computational efficiency

Shannon Entropy

mean-based probability 

Co-Author

Jihnhee Yu, SUNY, University at Buffalo

First Author

HyunAh Lee

Presenting Author

HyunAh Lee