Thursday, Aug 7: 10:30 AM - 12:20 PM
4228
Contributed Papers
Music City Center
Room: CC-105B
Main Sponsor
Section on Statistics and the Environment
Presentations
Photovoltaic (PV) solar power generation represents a viable option for meeting increased electricity demand. Accurate solar power predictions are crucial for feasibility studies of new installations and successful integration of PV systems into existing power grids. This need is particularly acute in South Africa, where the expansion of renewable energy capacity must be balanced against grid stability. This study applies kernel-based approaches to PV power prediction, focusing on capturing multi-scale temporal patterns in solar power generation. Using data from a large-scale PV installation in South Africa's Northern Cape region, the study investigates how kernel ridge regression can model both the inherent periodicity of solar power generation and its weather-dependent variations. The methodology addresses the complex interplay between weather patterns, seasonal variations, and power generation. The research contributes to the practical advancement of PV power prediction in renewable energy applications, with direct implications for grid integration and operational planning in regions with significant solar power installations.
Keywords
Photovoltaic power prediction
kernel methods
machine learning
renewable energy
applied statistics
In meteorological forecasting, convection-allowing grid-spacing models have significantly improved the simulation of heavy rainfall associated with warm-season convection. However, substantial errors in precipitation location persist, posing challenges for critical applications such as flood prediction. In this study, we develop machine learning (ML) tools to correct displacement errors in High-Resolution Ensemble Forecast (HREF) members using detailed mesoscale weather data from the Storm Prediction Center. The Method for Object-based Diagnostic Evaluation (MODE) was employed to identify key precipitation object characteristics, which served as inputs for ML models designed to refine centroid location errors in mesoscale convective systems across eight HREF ensemble members. Trained on data from 2018 to 2023, the models were tested in real-time during the 2024 Flash Flood and Intense Rainfall experiments. The best-performing ML model achieved an average reduction of 35–51% in storm centroid location error over the original HREF forecasts, demonstrating its potential for enhancing flood prediction accuracy.
Keywords
Mesoscale convective systems
Quantitative precipitation forecast
Mesoscale weather data
Great-circle distance
Machine learning postprocessor
Probability matched mean
Rapid changes in the cryosphere can affect climate change, such as global sea-level rise. Computer models are useful for understanding the behavior of Antarctic ice sheets and can be used to study their impact on rising sea levels. However, uncertainty quantification of model parameters is challenging because the model outputs and observations are high-dimensional and spatially correlated. Furthermore, they are semicontinuous with an excess of zeros. To address these challenges, we propose a diffusion model-based emulator that can accurately generate the pseudodata across various parameter settings. Since the resulting likelihood from the emulator is intractable, we propose an approximate Bayesian computation method with a Siamese network. The Siamese network is trained to determine whether images generated by the emulator with proposed parameters closely resemble observational data based on the similarity of their features. We apply our method to calibrate the computer model for the West Antarctic Ice Sheet data to generate future projections of sea level rise based on modern ice sheet observations, where the current approaches are infeasible due to the aforementioned challenges.
Keywords
Ice model calibration
diffusion model
approximate Bayesian computation
Siamese network
semicontinuous spatial data
The synthetic control (SC) method is widely used to estimate causal effects using panel data. However, the classical SC framework does not account for spatial dependence and spill-over effects common when observational units represent spatial entities such as cities, counties, or regions. Spatial correlation and latent spatial confounding can bias estimates, yet little research has addressed these issues systematically, and simulation studies in this context remain scarce.
We propose the spatially-augmented Bayesian synthetic control (SA-BSC), which integrates geographic distance into spike-and-slab priors on donor weights. Two specifications are available: distance-to-binary (D2B), where a control unit's inclusion probability decays with distance, and distance-to-variance (D2V), which exponentially shrinks the prior variance of distant donors. Using this approach, we can encompass additional information into the synthetic control estimation, leveraging the flexibility of semiparametric spatial priors for weights estimation. Through extensive simulations varying the pre-treatment window length, spatial autocorrelation, and magnitude of spill-over effects, we find that SA-BSC substantially reduces root-mean-squared error and improves posterior-interval coverage compared to standard non-spatial synthetic control methods.
We illustrate the application of SA-BSC with a large-scale observational study examining acute heat-stroke hospitalizations among an open cohort of fee-for-service Medicare beneficiaries in the contiguous United States, covering 34.5 million individuals from 2000 to 2016. Daily maximum heat-index data are linked to residential ZIP codes, defining heat waves as periods of two or more consecutive days exceeding the local 95th percentile. Each exposed ZIP-day constitutes a treated unit, with counterfactual donors constructed from contemporaneously unexposed ZIP codes. SA-BSC provides spatially coherent counterfactual outcomes and robust, interpretable causal estimates, highlighting its value for observational studies with complex spatial structures.
Keywords
Gun violence
Heatwaves
Causal inference
Spatial statistics
Synthetic controls
Environmental health
As the power grid moves to a more renewable future, energy sources from weather-driven phenomena such as solar power will form an increasingly large portion of electricity generation. The variability, non-Gaussianity and intermittency of solar resources challenge current grid operation paradigms, and realistic data scenarios are required for grid planning and operational studies. However, such data are not available at the space-time resolution needed for realistic grid models. Given sparse spatial samples, we introduce a framework for spatiotemporal prediction in a functional data analysis framework when data exhibit nonstationary phase misalignment. The approach is illustrated on a challenging high-frequency irradiance dataset and compared with existing methods.
Keywords
curve registration
distributed photovoltaic systems
functional data analysis
spatiotemporal prediction
Spatial stream networks (SSN) models characterize correlated ecological processes in dendritic ecosystems. Conventional SSN models rely on pre-processed stream networks and point-to-point hydrologic distances. However, this data processing may be labor-intensive and time-consuming over large spatial domains. Therefore, we propose to infer the functional connectivity of stream networks stochastically. Our physically-guided model utilizes the knowledge that water flows from high elevation to low elevation, and flow rate increases when two tributaries merge. We also leverage the hierarchical branching architecture of dendritic networks to alleviate computing and reduce uncertainty. Spatial autoregressive models composed of inferred SSNs propagate stochasticity between network connectivity and population dynamics in a Bayesian framework. We show in simulated examples that our mechanistic model facilitated learning about the functional network and enhanced predictive performance. We also demonstrate our approach in a large-scale case study using native brook trout (Salvelinus fontinalis) count data.
Keywords
Bayesian hierarchical models
Markov random field
population models
space-time dynamics