Contributed Poster Presentations: Section on Statistics and the Environment

Ryan Peterson Chair
University of Colorado - Anschutz Medical Campus
 
Tuesday, Aug 6: 2:00 PM - 3:50 PM
6049 
Contributed Posters 
Oregon Convention Center 
Room: CC-Hall CD 

Main Sponsor

Section on Statistics and the Environment

Presentations

35 A Matrix Factorization-Based Method for Solar Spectral Irradiance Missing Data Imputation

Solar Spectral Irradiance (SSI) is an important quantity in geophysical research, but missing data due to instrument downtime poses challenges. Existing methods, such as matrix completion and linear interpolation, struggle with recovery, due to the absence of temporal smoothness and the accomodation of 11-year SSI cycle driven by periodic solar magnetic activity.

This paper introduces SoftImpute with Projected Auto-regressive regularization (SIPA), a matrix factorization-based algorithm addressing downtime missingness. SIPA combines matrix low-rank pursuit and temporal smoothness regularization, offering an efficient alternating algorithm. A projection to the Auto-regressive (AR) penalty term prevents disturbance on non-downtime entries.

We prove the algorithm's non-decreasing property, analyze convergence rates, and design model assumptions for uncertainty quantification. An optimal sample splitting strategy for universal inference is given. Through simulated and real data, experimental validation demonstrates SIPA's superiority over existing methods in recovering downtime-induced missing Solar Spectral Irradiance data. 

Keywords

solar irradiance

vector time series

missing data imputation

matrix low-rank completion

alternating minimization

uncertainty quantification 

Abstracts


First Author

Yuxuan Ke

Presenting Author

Yuxuan Ke

36 Alaska Permanent Snow and Ice Classification for the National Resources Inventory Survey

Permanent snow and ice plays a crucial role in Earth's ecological system, affecting both climate and hydrology. However, their accurate classification remains challenging, especially in remote areas where field data collection is difficult. In this study, we leverage data from the 2007 Alaska National Resources Inventory (NRI) survey, publicly available remote sensing data, and global glacier inventory data to improve the classification of permanent snow and ice in Alaska. More specifically, our aims are: (i) develop machine learning methods to classify permanent snow and ice using extensive publicly available remote sensing data which is in line with the NRI definition; (ii) produce annual maps of permanent snow and ice in Alaska to assist in the sampling design and statistical inference of future surveys. To overcome issues of class imbalance and improve model training, we integrate multiple data sources, create relevant variables, and use Random Forests algorithm for classification, achieving 98.6% accuracy. Furthermore, we apply a cross-conformal prediction approach to quantify the uncertainty in the Random Forests prediction. 

Keywords

Permanent snow and ice

Random forests classification

Remote sensing data 

Abstracts


Co-Author

Zhengyuan Zhu, Iowa State University

First Author

Yingchao Zhou, Iowa State University

Presenting Author

Yingchao Zhou, Iowa State University

38 Higher-Order Spatial Structure Functions for Exploring Spatial Extremes

Spatial statistics has long relied on measures of second-order dependence (e.g., covariance functions and variograms) to characterize and model spatial dependence. In the turbulence literature, higher-order spatial dependence measures, such as third and fourth-order structure functions (variograms) have been instrumental in characterizing important behavior such as turbulent energy cascades. Here, we investigate the use of these higher-order structure functions to better characterize the dependence structure of spatial extremes, which can help with the specification of appropriate statistical models for such dependence. We illustrate the approach on simulated and real-world environmental data. 

Keywords

structure functions

spatial extremes

turbulence 

Abstracts


Co-Author(s)

Likun Zhang, University of Missouri-Columbia
Christopher Wikle, University of Missouri-Columbia

First Author

Souvik Bag

Presenting Author

Souvik Bag

39 Increasing the Accuracy of Tree-Ring Data Processing to Improve Models for Reconstructing Climate

Tree-ring data is used to reconstruct past climate and to predict future climate trends. In each year of a tree's lifespan a distinct ring is added to the tree's width, and widths of individual rings vary depending on the environment in which a tree lives. To process tree-ring data, two cores from each tree are extracted, and then all cores are combined before analysis. We aim to improve the modeling of future climate scenarios by assessing the accuracy of tree-ring data collection. Our investigation of data from the International Tree-Ring Data Bank found that correlated tree cores do not necessarily have the same ring widths, and trees with low correlation may have worse correlation with local climate data. These findings imply that only trees with moderate or better internal correlation should be used for climate modeling. To target differences among tree-ring data processing methods, we collected cores from trees in Elon University Forest. With this data, we combined and correlated widths of rings on each core with local climate. By combining these cores according to existing dendrochronological methods, we recommend best approaches that produce the best-fit with local climate. 

Keywords

Tree-rings

Climate

Environment

Dendrochronology

Climate Modeling 

Abstracts


Co-Author(s)

Nicholas Bussberg, Elon University
David Vandermast, Elon University

First Author

Bailey Reutinger

Presenting Author

Bailey Reutinger

40 Outlier Identification in Censored Environmental Time Series

When estimating trends in contaminated media, it is common to jointly observe apparent outliers and non-detects (i.e., left-censored observations). Identifying outliers usually requires de-trending the time series prior to screening for outlying residuals. The screening in turn requires a reference distribution from which to judge outlying points.

The combination of censored data, nonlinear trends, and outliers raises challenges: 1) how to estimate the trend prior to treating non-detects, and vice-versa? 2) how to compute 'censored residuals' from the trend? 3) how to build a reference distribution given substantial censoring?

We formerly proposed a Monte Carlo mixture model that samples non-detects from a class of bounded distributions on the interval (0, DL), where DL is a left-censoring limit. We illustrate how this mixture model can accurately identify outliers by constructing a broad trend using the mean estimate of repeated draws from the mixture model, and studentizing the trend residuals to both flag and down-weight outliers via an appropriate kernel applied to the studentized distance from the trend.

The benefits of this strategy are explored. 

Keywords

Outliers

Left-Censored Data

Time Series

Environmental

Monte Carlo

Trends 

Abstracts


First Author

Kirk Cameron, Macstat Consulting, Ltd.

Presenting Author

Kirk Cameron, Macstat Consulting, Ltd.

41 Seed Choice for Fitting Gaussian Mixture Models of Instar Characteristics in Lepidopterans

Estimation of a lepidopteran's instar prior to pupation has applications in applied research. We modeled successive larval instars for the study species Hyphantria cunea based on head capsule width data using Gaussian mixture models (GMMs) fit by Estimation-Maximization (EM). To generate starting values under the assumption of n instars, we calculated n head capsule widths consistent with Brooks-Dyar spacing with the smallest value positioned at a small quantile q1 of the observed head capsule widths and the largest value positioned at a large quantile qn of the observed widths. We used the means of the n resulting clusters as starting values for the means of the Gaussian distributions, the variances of the head capsule widths in each cluster as starting values for the variances of the Gaussian distributions, and the proportions in each cluster as starting values for the mixing proportions of the Gaussian distributions. We used Brooks-Dyar spacing to select the number of instars. We found that this form of seed, in contrast to other methods of generating seeds, produces reliable, rapid convergence of the EM algorithm to biologically reasonable models. 

Keywords

Brooks-Dyar’s rule

Gaussian mixture model

instar

EM seed 

Abstracts


Co-Author(s)

Mykaela Tanino-Springsteen, Department of Biological Sciences, University of Denver
Dhaval Vayas, Department of Biological Sciences, University of Denver
Audrey Mitchell, Department of Biological Sciences, University of Denver
Shannon Murphy, Department of Biological Sciences, University of Denver

First Author

Catherine Durso, University of Denver

Presenting Author

Catherine Durso, University of Denver

42 Soil Lipid Atlas

Lipids play a crucial role in soil ecology. They are influential for the formation of soil organic matter and serve as indicators of responses to environmental changes. In spite of this, the field of lipidomics is still in its nascent stages. There is a pressing need to compile lipid profiles from microbial isolates that can reflect the functional groups, thereby enhancing biological understanding of soil lipid data. To address this gap, we created the Soil Lipid Atlas, a comprehensive database for soil lipids. This resource enables researchers to explore lipids and their relationship with specific microbial taxa and functional groups. Within the atlas, users may select studies and compare different treatments and stressors. This enables the observation of differences in the presence and absence of lipids under different treatment conditions and across different strains. Furthermore, users can perform statistical analyses on the log2 fold changes of lipids both on a lipid-by-lipid basis and a strain-by-strain basis. The vision of this atlas is to provide a platform for researchers to contribute lipidomics data, fostering the growth of the database as a community-driven resource. 

Keywords

Lipidomics

Database

Microbiome

Isolates 

Abstracts


Co-Author(s)

Lisa Bramer, Pacific Northwest National Laboratory
Sheryl Bell, Pacific Northwest National Laboratory
Kirsten Hofmockel, Pacific Northwest National Laboratory
Sneha Couvillion, Pacific Northwest National Laboratory

First Author

Damon Leach, Pacific Northwest National Laboratory

Presenting Author

Damon Leach, Pacific Northwest National Laboratory

43 Spatial Data Fusion with the Multiresolution-Gaussian Process model LatticeKrig

Remotely sensed observations of the atmosphere play an important role in climate research since they often have more extensive spatial coverage than surface measurements. One challenge with satellite data in particular is that an observation represents a spatial average over the satellite footprint rather than a point location. Moreover, this problem is compounded when the footprints vary in size and degree of overlap between successive observations. Our goal is to combine observations of the same process from different remotely-sensed platforms into a single model, precisely accounting for heterogeneity across multiple observations and spatial averaging. We adapt earlier data fusion methods, often referred to as change-of-support methods in geostatistics, using LatticeKrig, a fixed-rank multiresolution-Gaussian Process model. This framework leverages sparse linear algebra and efficient basis representations to provide computational efficiency when faced with large data volumes. We demonstrate our method by fusing total column carbon monoxide (CO) from the MOPITT and TROPOMI satellite instruments for the Australasia and Maritime Southeast Asia regions. 

Keywords

change-of-support

data fusion

basis function

satellite data

total column carbon monoxide

spatial statistics 

Abstracts


Co-Author(s)

Douglas Nychka, Colorado School of Mines
Dorit Hammerling, Colorado School of Mines

First Author

Ryan Peterson, Colorado School Of Mines

Presenting Author

Ryan Peterson, Colorado School Of Mines

44 Using Data Assimilation to Reconstruct Paleoclimate for East Asia Since the 14th Century

In this study, we utilize the Reconstructed East Asian Climate Historical Encoded Series (REACHES) data derived from Chinese historical documents to reconstruct temperature in East Asia since the 14th century. The REACHES temperature indices exhibit bias due to missing values, primarily representing normal weather. To address this, we employ simple kriging to impute the missing data, with the mean of the underlying spatial process set to zero. To enhance temperature reconstruction accuracy, we propose a data assimilation approach that combines the kriged REACHES temperature data with the Last Millennium Ensemble (LME) reanalysis data. Our approach first estimates the temperature distribution by applying regularized maximum likelihood, incorporating a fused lasso penalty within a nonstationary time series model based on the LME data. The resulting distribution serves as the prior, which is subsequently updated to obtain refined temperatures based on the REACHES data using the Kalman filter and smoother. Our approach, which integrates historical records, climate model, and statistical techniques, sheds light on past temperature variations and refines historical temperature estimates. 

Keywords

Bayesian inference

Fused lasso

Simple kriging

Penalized maximum likelihood

Kalman filter 

Abstracts


Co-Author(s)

Hsin-Cheng Huang, Academia Sinica
Kuan-hui Elaine Lin, National Taiwan Normal University
Wan-Ling Tseng, National Taiwan University

First Author

Eric Sun

Presenting Author

Eric Sun