Tuesday, Aug 5: 2:00 PM - 3:50 PM
0307
Invited Paper Session
Music City Center
Room: CC-104A
Applied
No
Main Sponsor
Journal of Computational and Graphical Statistics
Co Sponsors
Section on Statistical Computing
Section on Statistical Graphics
Presentations
Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to various supervised learning problems. However, the greater prevalence and complexity of missing data in modern datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of deeply learned generalized linear models, a supervised DL architecture for regression and classification problems. We propose a new architecture, dlglm, that is one of the first to be able to flexibly account for both ignorable and non-ignorable patterns of missingness in input features and response at training time. We demonstrate through statistical simulation that our method outperforms existing approaches for supervised learning tasks in the presence of missing not at random (MNAR) missingness. We conclude with a case study of the Bank Marketing dataset from the UCI Machine Learning Repository, in which we predict whether clients subscribed to a product based on phone survey data.
Keywords
Deep Learning
Missing Data
Variational Inference
Supervised Learning
MNAR
Generalized Linear Models
The backward/ancestor sampling conditional particle filter (BS-CPF) (Whiteley, J. Roy. Stat. Soc. B 2010; Lindsten, Jordan and Schön, J. Mach. Learn. Res., 2014) are Markov transitions targeting the smoothing distribution of a general state space hidden Markov model. They are known to scale extremely well for long data records, with provable O(log T) mixing times where T is the data record length, leading to overall complexity of O(T log T) [1].
The standard version of the BS-CPF, however, are not well-suited for hidden Markov models having with 'weakly informative' observations and stiff dynamics. Such a scenario occurs when we have access to a good Gaussian approximation for the smoothing distribution, or when the model is a time-discretisation of a continuous-time (path integral type) model. The inefficiency occurs for two reasons: commonly used multinomial resampling is unsuitable for weakly informative observations and introduces excess variance; and a slowly mixing dynamic model renders the backward sampling step ineffective. We discuss a modified method [2] that resolves the former issue by replacing multinomial resampling by a conditional version of recently suggested variant of systematic resampling. To avoid the degeneracy issue of backward sampling, we introduce a generalisation that involves backward sampling with an auxiliary `bridging' step.
The presentation is based on the following papers:
[1] J. Karjalainen, A. Lee, S. S. Singh and M. Vihola. Mixing time of the conditional backward sampling particle filter.
arXiv:2312.17572
[2] S. Karppinen, S. S. Singh and M. Vihola. Conditional particle filters with bridge backward sampling. Journal of Computational and Graphical Statistics, 33(2):364–378, 2024. doi:10.1080/10618600.2023.2231514
Keywords
conditional particle filter, Gaussian approximation, hidden Markov model, general state-space model, Markov chain Monte Carlo, mixing time
This paper focuses on improving the analysis and modelling of point processes by addressing the limitations of current methods in handling complex spatial dependencies. In spatial statistics, a primary objective is the estimation of the intensity function, which describes how the expected number of events varies in space according to spatial coordinates and possible available covariates. Traditional methods, such as composite likelihood estimation, are often based on Poisson process assumptions. However, these methods may fail to capture the underlying spatial interactions and dependencies in non-Poissonian point processes, leading to overfitting. Spatial clustering, for example, is often misinterpreted as being driven solely by covariates, rather than a combination of covariate effects and spatial dependence.
The main challenge is to accurately model these spatial dependencies. In composite likelihood estimation, spatial interactions are often neglected, if the dependence structure between points becomes asymptotically negligible. While this may be true for some processes, real-world applications typically involve finite data sets, where dependencies can have a significant impact on the results. In these cases, Poisson-based models distort results by attributing all spatial structures to covariates, instead of recognising the influence of clustering or interactions between points.
To overcome these limitations, the paper introduces advanced methodologies that incorporate second-order statistics to better account for spatial dependencies. Second-order statistics, such as Ripley's K-function or pairwise correlation function, play a crucial role in capturing local interactions within point patterns. By exploiting these statistics, the proposed approach provides a deeper understanding of spatial structures, particularly in non-Poissonian processes where spatial dependence and clustering are central factors.
A key contribution of this work is the incorporation of second-order local synthesis statistics into the modelling process, which provides detailed insight into the local structure of point patterns. Local Indicators of Spatial Association (LISA) functions are used to identify localised deviations from assumed spatial relationships, such as random labelling. These functions serve as diagnostic tools, allowing models to be adapted to account for spatial dependencies and to better distinguish between the effects of covariates and spatial interactions.
In addition, we focus on a new family of weighted, inhomogeneous local summary statistics for (functional) marked point processes. These statistics, which can be second-order or higher, are flexible and able to capture a range of local dependence structures subject to the chosen weight function. This framework allows for the construction of various existing summary statistics, making it a versatile tool for the analysis of marked point processes. Going beyond traditional methods, this approach offers a more refined and accurate model for complex point patterns with spatial dependencies.
In terms of practical application, the effectiveness of the proposed methods is proved through simulation studies. These simulations show that the new techniques can detect local deviations from random labelling and more accurately account for spatial interactions than traditional approaches. Furthermore, we present a real-world application involving earthquake point patterns, in which functional marks (such as seismic waveforms) are used to study the spatial dependence between events. This case study illustrates the practical utility of the methodology in the analysis of complex point processes.
In conclusion, this paper contributes to spatial statistics by offering an improved method for estimating the intensity function of point processes, particularly in cases where interactions and spatial dependencies are significant. By incorporating second-order local statistics and adjusting for spatial dependence, the methodology improves inference on the relationship between covariates and the structure of point processes. The introduction of marks-weighted inhomogeneous local summary statistics further improves the analysis of complex spatial patterns, providing a useful tool for various applications in fields such as ecology, epidemiology and seismology.
Keywords
Spatial Point process
Local Indicators of Spatial Association (LISA)
Inhomogeneous Mark-Weighted Function
Composite likelihood
spatial dependencies
Random labelling
In many regression settings the unknown coefficients may have some known structure, for instance they may be ordered in space or correspond to a vectorized matrix or tensor. At the same time, the unknown coefficients may be sparse, with many nearly or exactly equal to zero. However, many commonly used priors and corresponding penalties for coefficients do not encourage simultaneously structured and sparse estimates. In this article we develop structured shrinkage priors that generalize multivariate normal, Laplace, exponential power and normal-gamma priors. These priors allow the regression coefficients to be correlated a priori without sacrificing elementwise sparsity or shrinkage. The primary challenges in working with these structured shrinkage priors are computational, as the corresponding penalties are intractable integrals and the full conditional distributions that are needed to approximate the posterior mode or simulate from the posterior distribution may be nonstandard. We overcome these issues using a flexible elliptical slice sampling procedure, and demonstrate that these priors can be used to introduce structure while preserving sparsity. Supplementary materials for this article are available online.