Thursday, Aug 7: 10:30 AM - 12:20 PM
4220
Contributed Papers
Music City Center
Room: CC-Davidson Ballroom A3
Main Sponsor
Royal Statistical Society
Presentations
Network models provide a powerful framework for analysing single-cell count data, facilitating the characterisation of cellular identities, disease mechanisms, and developmental trajectories. However, uncertainty modeling in unsupervised learning with genomic data remains insufficiently explored. Conventional clustering methods assign a singular identity to each cell, potentially obscuring transitional states during differentiation or transformation. This study introduces a variational Bayesian framework for clustering and analysing single-cell genomic data, employing a Bayesian Gaussian mixture model to estimate the probabilistic association of cells with distinct clusters. This approach captures cellular transitions, yielding biologically coherent insights into neurogenesis and breast cancer progression. The inferred clustering probabilities enable further analyses, including Differential Expression Analysis, Gene Set Enrichment Analysis, and pseudotime analysis. Furthermore, we develop a novel quantitative measure to validate unsupervised learning with scRNA-seq data, reflecting a more authentic correspondence between clustering outcomes and marker genes. This methodological advancement enhances the resolution of single-cell data analysis, enabling a more nuanced characterisation of dynamic cellular identities in development and disease.
Keywords
Unsupervised learning
Variational Bayesian Estimation of a Gaussian Mixture
Pseudotime analysis
Dimensionality reduction
Single-cell genomics
Embryo cortical development and Breast cancer progression
The importance of forecasting inflation as well as possible cannot be overstated, especially in a country that follows the inflation targeting strategy. With that in mind, this work takes the idea of aggregate regional price indices in 23 Colombian cities to forecast national inflation. This paper employs individual price index time series models to identify price level changes based on city series. Using trend models that incorporate these breaks, we forecast monthly and annual total inflation. Our results show that including trend breaks and disaggregated information improves the accuracy of annual inflation prediction across many time horizons and competes with the item-by-item aggregated forecast exercises. We obtained gains in RMSFE of around 13% and 45%, for one and two months ahead, relative to an aggregated ARIMA model. The forecasts for the end of 2025 are close to 4.5%.
Keywords
Consumer Price Indexes
Linear Trend Models
Structural Breaks
Forecasting
Regional Forecast Aggregation
We introduce a new model that simultaneously detects communities within individual layers of a multiplex network while inferring a global node clustering across the layers. A Stochastic Block Model (SBM) is assumed in each layer, with probabilities of layer-level group memberships determined by a node's global group assignment. Our model uses a Bayesian framework, employing a probit stick-breaking process to construct node-specific mixing proportions over a set of shared Griffiths-Engen-McCloseky (GEM) distributions. These proportions determine layer-level community assignment, allowing for an unknown and varying number of groups across layers, while incorporating nodal covariate information to inform the global clustering. We propose a scalable variational procedure with parallelisable updates for application to large networks. Extensive simulation studies demonstrate our model's ability to accurately recover both global and layer-level clusters in complicated settings, and applications to real data showcase the model's effectiveness in uncovering interesting latent network structure.
Keywords
Multiplex networks
Community detection
Dirichlet process
Stochastic block model
Variational inference
When statistical data is large and distributed across multiple locations, initial estimates of the population distribution are often computed locally and then aggregated centrally. For parametric models, simple averaging typically ensures optimal convergence rates. However, in finite mixture models, where the parameter space is non-Euclidean, aggregation requires more refined methods due to computational and statistical challenges.
To address these issues, we propose using composite transportation divergence to aggregate mixture distributions, yielding an estimator that is optimal under the defined criteria. We develop an MM algorithm that guarantees convergence to at least a local optimum in a finite number of iterations. Our approach also applies to Gaussian mixture reduction, approximating a high-order mixture with a lower-order one. Under slightly stronger assumptions, the aggregated estimator retains its optimal convergence rate and can be made tolerant to Byzantine failures.
Keywords
Composite transportation distance
distributed learning
finite mixture model
mixture reduction
MM alrogithm
Co-Author
Qiong Zhang, Renmin University of China
First Author
Jiahua Chen, University of British Columbia
Presenting Author
Jiahua Chen, University of British Columbia
Brain image analysis is a rapidly advancing field, yet accurately identifying Regions of Interest (ROIs) remains challenging due to the limitations of traditional methods in precision, efficiency, and interpretability. While neural networks effectively handle large datasets and learn complex features, they often demand high computational resources, lengthy training times, and lack transparency.
To overcome these challenges, we propose an innovative method that enhances ROI identification accuracy and interpretability while improving computational efficiency. Our approach integrates classification-based uncertainty estimation and probability-driven techniques, employing adaptive sampling via Shannon entropy and a mean-based probability framework. Block kriging and statistical inference further enable efficient and precise hotspot detection, significantly reducing training time without sacrificing performance.
The proposed method integrates seamlessly with Convolutional Neural Networks (CNNs), offering accurate hotspot detection with reduced computational complexity. A subset of the Traumatic Brain Injury (TRACK-TBI) study dataset is analyzed to demonstrate its effectiveness.
Keywords
Region of Interst(ROI)
Convolutional Neural Networks (CNNs)
Computational efficiency
Shannon Entropy
mean-based probability