Design of Experiments and Statistical Analysis for Modern Applications

Asuman Turkmen Chair
The Ohio State University
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4144 
Contributed Papers 
Music City Center 
Room: CC-Davidson Ballroom A3 
This session focuses on advances in design and analysis of experiments as well as statistical inference for complex physical systems.

Main Sponsor

Section on Physical and Engineering Sciences

Presentations

The Power of Foldover Designs

The analysis of screening designs is often based on a second-order model with linear main effects, two-factor interactions, and quadratic effects. When the main effect columns are orthogonal to all the second-order terms, a two-stage analysis may be conducted starting with fitting a main effect only model. A popular technique to achieve this orthogonality is to take any design and append its foldover runs. In this talk, we show that this foldover technique is even powerful than originally thought because it also includes opportunities for unbiased estimation of the variance either by pure error or lack of fit. We find optimal foldover designs for main effect estimation and other designs that balance main effect estimation and model selection for the important factors. A real life implementation of our new designs involving 8 factors and 20 runs is discussed. 

Keywords

Optimal design

Response surface design

Experimental design 

First Author

Jonathan Stallrich, North Carolina State University

Presenting Author

Jonathan Stallrich, North Carolina State University

Large, Row-Constrained Supersaturated Designs for High-Throughput Screening

High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates, is widely used across many areas of the biological and chemical sciences including drug discovery. To improve the efficiency of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools are typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening, or CRowS. We introduce the designs and their construction, and study their behavior as a function of the constraint. Via simulation, we show that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and as time permits provide results from two separate applications, both related to the search for solutions to antibiotic-resistant bacteria. 

Keywords

drug discovery

screening

experimental design

Lasso 

Co-Author(s)

Stephen Wright, Miami University
Isaac Williams, Miami University
Richard Page, Miami University
Andor Kiss, Miami University
Surendra Bikram Silwal, Miami University
Maria Weese, Miami University
David Edwards, The Citadel, The Military College of South Carolina
Brian Ahmer, The Ohio State University
Meng Wu, The Ohio State University
Emily Rego, The Ohio State University
Zhihong Lin, The Ohio State University

First Author

Byran Smucker, Henry Ford Health

Presenting Author

Byran Smucker, Henry Ford Health

Space-Filling Two-Level Factorial Designs

In this short talk, we survey the literature on binary maximin distance and minimax distance designs for both regular and nonregular designs. For the class of regular 2^(n-p) fractions, we found that all minimum aberration designs with 10 or fewer factors are maximin distance designs with minimum index. For 11 or more factors, there are exceptions to this rule, since there are cases where the dual of a minimum aberration design does not have minimum aberration. For nonregular fractions, we show examples where minimum G-aberration designs perform very poorly with respect to the space-filling properties. Finally, we show how to reduce the computational burden for determining binary minimax distance designs. 

Keywords

maximin distance

minimax distance

error-correcting codes

binary design

fractional factorial design

orthogonal array 

Co-Author(s)

Delong Li, Nankai University
Chunyan Wang, Renmin University

First Author

Robert Mee, University of Tennessee

Presenting Author

Robert Mee, University of Tennessee

Automated Analysis of Experiments using Hierarchical Garrote

In this work, we propose an automatic method for the analysis of experiments that incorporates hierarchical relationships between the experimental variables. We use a modified version of the nonnegative garrote method for variable selection which can incorporate hierarchical relationships. The nonnegative garrote method requires a good initial estimate of the regression parameters for it to work well. To obtain the initial estimate, we use generalized ridge regression with the ridge parameters estimated from a Gaussian process prior placed on the underlying input-output relationship. The proposed method, called HiGarrote, is fast, easy to use, and requires no manual tuning. Analysis of several real experiments are presented to demonstrate its benefits over the existing methods. 

Keywords

Gaussian process; Generalized ridge regression; Nonnegative garrote; Variable selection. 

Co-Author

Roshan Joseph, School of ISYE, Georgia Tech

First Author

Wei-Yang Yu

Presenting Author

Wei-Yang Yu

Pitfalls and Remedies for Maximum Likelihood Estimation of Gaussian Processes

Gaussian processes (GPs) are popular as nonlinear regression models for expensive computer simulations. Yet, GP performance relies heavily on estimation of unknown kernel hyperparameters. Maximum likelihood estimation (MLE) is the most common tool, but it can be plagued by numerical issues in small data settings. Penalized likelihood methods attempt to overcome optimization challenges, but their success depends on tuning parameter selection. Common approaches select the penalty weight using leave-one-out cross validation (CV) with prediction error. Although straightforward, it is computationally expensive and ignores the uncertainty quantification (UQ) provided by the GP. We propose a novel tuning parameter selection scheme which combines k-fold CV with a score metric that accounts for GP accuracy and UQ. Additionally, we incorporate a one-standard-error rule to encourage smoother predictions in the face of limited data, which remedies flat likelihood issues. Our proposed tuning parameter selection for GPs matches the performance of standard MLE when no penalty is warranted, excels in settings where regularization is preferred, and outperforms the benchmark leave-one-out CV. 

Keywords

Gaussian processes

Computer experiments

Penalized likelihood 

Co-Author(s)

Annie Booth, Virginia Tech
Jonathan Stallrich, North Carolina State University

First Author

Ayumi Mutoh, North Carolina State University

Presenting Author

Ayumi Mutoh, North Carolina State University

A Test for a Class of Semi-Stationary Time Series with an Application to Vibration Data

A time series is second-order stationary if both its mean and covariance structure remain constant over time. Many existing methods test for second-order stationarity, as it is a crucial assumption in the analysis of classical time series and certain stationary nonlinear time series. However, few methods are available to determine whether a time series is semi-stationary. If a time series is semi-stationary, it can be analyzed much more easily than a general non-stationary time series. In this paper, we propose a new time-domain test to assess whether the normalized frequency pattern of a non-stationary time series remains unchanged over time. A robust statistical method is developed, and its asymptotic distribution is derived. A simulation study is conducted to evaluate the finite-sample performance of the proposed method. Finally, we apply the proposed method to vibrational data to assess whether a mechanical system exhibits linear behavior within a certain range of inputs. 

Keywords

Dynamics

Periodogram

Spectral method

Robust

Semi-stationary

Vibration data 

First Author

Lei Jin, Texas A&M University-Corpus Christi

Presenting Author

Lei Jin, Texas A&M University-Corpus Christi

Signal detection under unknown background when only one unlabeled data is available

Searches for new physics involve detecting the presence of a specific signal in data that is contaminated by a background. This is particularly challenging when a reliable description of the background is unavailable. Our aim is to develop a statistical method to test the presence of the signal in the data and estimate the signal proportion even when the background is unknown. Moreover, we carry out the signal search using a single physics dataset generated from the experiments that may or may not contain the signal of interest. Our approach relies on using orthonormal expansion to model the deviation between a proposal density and the unknown data generating density. We propose choosing the proposal density in a way that ensures a conservative estimate of the signal proportion to avoid false discovery. Reliability of this approach is demonstrated through simulation studies, application on realistic simulated data from the Fermi Large Area Telescope and on data from the ATLAS experiment. We also perform a comparative analysis of our method with the so-called "safeguard" method commonly employed in particle physics and explore cases where the latter leads to false discoveries. 

Keywords

signal detection

background

orthonormal expansion

false discovery

safeguard

ATLAS experiment 

Co-Author(s)

Sara Algeri, University of Minnesota
Lydia Brenner, Nikhef
Oliver Rieger, Nikhef

First Author

Aritra Banerjee

Presenting Author

Aritra Banerjee