Tuesday, Aug 6: 2:00 PM - 3:50 PM
6049
Contributed Posters
Oregon Convention Center
Room: CC-Hall CD
Main Sponsor
Section on Statistics and the Environment
Presentations
Solar Spectral Irradiance (SSI) is an important quantity in geophysical research, but missing data due to instrument downtime poses challenges. Existing methods, such as matrix completion and linear interpolation, struggle with recovery, due to the absence of temporal smoothness and the accomodation of 11-year SSI cycle driven by periodic solar magnetic activity.
This paper introduces SoftImpute with Projected Auto-regressive regularization (SIPA), a matrix factorization-based algorithm addressing downtime missingness. SIPA combines matrix low-rank pursuit and temporal smoothness regularization, offering an efficient alternating algorithm. A projection to the Auto-regressive (AR) penalty term prevents disturbance on non-downtime entries.
We prove the algorithm's non-decreasing property, analyze convergence rates, and design model assumptions for uncertainty quantification. An optimal sample splitting strategy for universal inference is given. Through simulated and real data, experimental validation demonstrates SIPA's superiority over existing methods in recovering downtime-induced missing Solar Spectral Irradiance data.
Keywords
solar irradiance
vector time series
missing data imputation
matrix low-rank completion
alternating minimization
uncertainty quantification
Abstracts
Permanent snow and ice plays a crucial role in Earth's ecological system, affecting both climate and hydrology. However, their accurate classification remains challenging, especially in remote areas where field data collection is difficult. In this study, we leverage data from the 2007 Alaska National Resources Inventory (NRI) survey, publicly available remote sensing data, and global glacier inventory data to improve the classification of permanent snow and ice in Alaska. More specifically, our aims are: (i) develop machine learning methods to classify permanent snow and ice using extensive publicly available remote sensing data which is in line with the NRI definition; (ii) produce annual maps of permanent snow and ice in Alaska to assist in the sampling design and statistical inference of future surveys. To overcome issues of class imbalance and improve model training, we integrate multiple data sources, create relevant variables, and use Random Forests algorithm for classification, achieving 98.6% accuracy. Furthermore, we apply a cross-conformal prediction approach to quantify the uncertainty in the Random Forests prediction.
Keywords
Permanent snow and ice
Random forests classification
Remote sensing data
Abstracts
Spatial statistics has long relied on measures of second-order dependence (e.g., covariance functions and variograms) to characterize and model spatial dependence. In the turbulence literature, higher-order spatial dependence measures, such as third and fourth-order structure functions (variograms) have been instrumental in characterizing important behavior such as turbulent energy cascades. Here, we investigate the use of these higher-order structure functions to better characterize the dependence structure of spatial extremes, which can help with the specification of appropriate statistical models for such dependence. We illustrate the approach on simulated and real-world environmental data.
Keywords
structure functions
spatial extremes
turbulence
Abstracts
Tree-ring data is used to reconstruct past climate and to predict future climate trends. In each year of a tree's lifespan a distinct ring is added to the tree's width, and widths of individual rings vary depending on the environment in which a tree lives. To process tree-ring data, two cores from each tree are extracted, and then all cores are combined before analysis. We aim to improve the modeling of future climate scenarios by assessing the accuracy of tree-ring data collection. Our investigation of data from the International Tree-Ring Data Bank found that correlated tree cores do not necessarily have the same ring widths, and trees with low correlation may have worse correlation with local climate data. These findings imply that only trees with moderate or better internal correlation should be used for climate modeling. To target differences among tree-ring data processing methods, we collected cores from trees in Elon University Forest. With this data, we combined and correlated widths of rings on each core with local climate. By combining these cores according to existing dendrochronological methods, we recommend best approaches that produce the best-fit with local climate.
Keywords
Tree-rings
Climate
Environment
Dendrochronology
Climate Modeling
Abstracts
When estimating trends in contaminated media, it is common to jointly observe apparent outliers and non-detects (i.e., left-censored observations). Identifying outliers usually requires de-trending the time series prior to screening for outlying residuals. The screening in turn requires a reference distribution from which to judge outlying points.
The combination of censored data, nonlinear trends, and outliers raises challenges: 1) how to estimate the trend prior to treating non-detects, and vice-versa? 2) how to compute 'censored residuals' from the trend? 3) how to build a reference distribution given substantial censoring?
We formerly proposed a Monte Carlo mixture model that samples non-detects from a class of bounded distributions on the interval (0, DL), where DL is a left-censoring limit. We illustrate how this mixture model can accurately identify outliers by constructing a broad trend using the mean estimate of repeated draws from the mixture model, and studentizing the trend residuals to both flag and down-weight outliers via an appropriate kernel applied to the studentized distance from the trend.
The benefits of this strategy are explored.
Keywords
Outliers
Left-Censored Data
Time Series
Environmental
Monte Carlo
Trends
Abstracts
Estimation of a lepidopteran's instar prior to pupation has applications in applied research. We modeled successive larval instars for the study species Hyphantria cunea based on head capsule width data using Gaussian mixture models (GMMs) fit by Estimation-Maximization (EM). To generate starting values under the assumption of n instars, we calculated n head capsule widths consistent with Brooks-Dyar spacing with the smallest value positioned at a small quantile q1 of the observed head capsule widths and the largest value positioned at a large quantile qn of the observed widths. We used the means of the n resulting clusters as starting values for the means of the Gaussian distributions, the variances of the head capsule widths in each cluster as starting values for the variances of the Gaussian distributions, and the proportions in each cluster as starting values for the mixing proportions of the Gaussian distributions. We used Brooks-Dyar spacing to select the number of instars. We found that this form of seed, in contrast to other methods of generating seeds, produces reliable, rapid convergence of the EM algorithm to biologically reasonable models.
Keywords
Brooks-Dyar’s rule
Gaussian mixture model
instar
EM seed
Abstracts
Co-Author(s)
Mykaela Tanino-Springsteen, Department of Biological Sciences, University of Denver
Dhaval Vayas, Department of Biological Sciences, University of Denver
Audrey Mitchell, Department of Biological Sciences, University of Denver
Shannon Murphy, Department of Biological Sciences, University of Denver
First Author
Catherine Durso, University of Denver
Presenting Author
Catherine Durso, University of Denver
Lipids play a crucial role in soil ecology. They are influential for the formation of soil organic matter and serve as indicators of responses to environmental changes. In spite of this, the field of lipidomics is still in its nascent stages. There is a pressing need to compile lipid profiles from microbial isolates that can reflect the functional groups, thereby enhancing biological understanding of soil lipid data. To address this gap, we created the Soil Lipid Atlas, a comprehensive database for soil lipids. This resource enables researchers to explore lipids and their relationship with specific microbial taxa and functional groups. Within the atlas, users may select studies and compare different treatments and stressors. This enables the observation of differences in the presence and absence of lipids under different treatment conditions and across different strains. Furthermore, users can perform statistical analyses on the log2 fold changes of lipids both on a lipid-by-lipid basis and a strain-by-strain basis. The vision of this atlas is to provide a platform for researchers to contribute lipidomics data, fostering the growth of the database as a community-driven resource.
Keywords
Lipidomics
Database
Microbiome
Isolates
Abstracts
Remotely sensed observations of the atmosphere play an important role in climate research since they often have more extensive spatial coverage than surface measurements. One challenge with satellite data in particular is that an observation represents a spatial average over the satellite footprint rather than a point location. Moreover, this problem is compounded when the footprints vary in size and degree of overlap between successive observations. Our goal is to combine observations of the same process from different remotely-sensed platforms into a single model, precisely accounting for heterogeneity across multiple observations and spatial averaging. We adapt earlier data fusion methods, often referred to as change-of-support methods in geostatistics, using LatticeKrig, a fixed-rank multiresolution-Gaussian Process model. This framework leverages sparse linear algebra and efficient basis representations to provide computational efficiency when faced with large data volumes. We demonstrate our method by fusing total column carbon monoxide (CO) from the MOPITT and TROPOMI satellite instruments for the Australasia and Maritime Southeast Asia regions.
Keywords
change-of-support
data fusion
basis function
satellite data
total column carbon monoxide
spatial statistics
Abstracts
In this study, we utilize the Reconstructed East Asian Climate Historical Encoded Series (REACHES) data derived from Chinese historical documents to reconstruct temperature in East Asia since the 14th century. The REACHES temperature indices exhibit bias due to missing values, primarily representing normal weather. To address this, we employ simple kriging to impute the missing data, with the mean of the underlying spatial process set to zero. To enhance temperature reconstruction accuracy, we propose a data assimilation approach that combines the kriged REACHES temperature data with the Last Millennium Ensemble (LME) reanalysis data. Our approach first estimates the temperature distribution by applying regularized maximum likelihood, incorporating a fused lasso penalty within a nonstationary time series model based on the LME data. The resulting distribution serves as the prior, which is subsequently updated to obtain refined temperatures based on the REACHES data using the Kalman filter and smoother. Our approach, which integrates historical records, climate model, and statistical techniques, sheds light on past temperature variations and refines historical temperature estimates.
Keywords
Bayesian inference
Fused lasso
Simple kriging
Penalized maximum likelihood
Kalman filter
Abstracts