Frontiers in the analysis of contaminated data in decision making

Abstract Number:

1011 

Submission Type:

Invited Paper Session 

Participants:

Tianying Wang (1), Tianying Wang (1), Grace Yi (2), Yanyuan Ma (3), Sharon Xie (4), Tianying Wang (1)

Institutions:

(1) Colorado State University, N/A, (2) University of Western Ontario, N/A, (3) Penn State University, N/A, (4) University of Pennsylvania, Perelman School of Medicine, N/A

Chair:

Tianying Wang  
Colorado State University

Session Organizer:

Tianying Wang  
Colorado State University

Speaker(s):

Grace Yi  
University of Western Ontario
Yanyuan Ma  
Penn State University
Sharon Xie  
University of Pennsylvania, Perelman School of Medicine
Tianying Wang  
Colorado State University

Session Description:

The motivation of this session is to bring together leading experts and junior researchers who are working at the interface between statistical methods and applications to discuss the use and misuse of data in frontier research topics for decision-making purposes, especially when observed data is contaminated with errors.

The existence of errors-in-variables, also called measurement errors, is a well-recognized issue in various disciplines, including epidemiology, bioinformatics, engineering, climate science, etc. In recent years, increasing literature has acknowledged the necessity of recognizing such contaminated variables due to limited technologies (e.g., sequencing technologies in genetics and genomics) or sampling bias. Many researchers in domain sciences have also been aware that ignoring such an issue will lead to biased estimation and invalid statistical inference, which will severely impact the decision-making process. This session aims to (a.) increase awareness of potential data contamination issues in the big data era and (b.) introduce the latest statistical methods to help correct measurement errors and make valid statistical inferences.

In this session, we will introduce recent novel developments in the analysis of errors-in-variables from both theoretical and applied perspectives. We have four fantastic female researchers in statistics and biostatistics from the US and Canada to present their innovative error-handling approaches with diversified research problems from epidemiology, biostatistics, machine learning, and causal inference:

(1) Causal learning of paired vectors with label noise (by Grace Yi, Professor, Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario);
(2) Prediction in measurement error problems (by Yanyuan Ma, Professor, Department of Statistics, Penn State University);
(3) A pseudo-simulation extrapolation method for misspecified models with errors-in-variables in epidemiological studies (by Tianying Wang, Assistant Professor, Department of Statistics, Colorado State University);
(4) Cox model with left-truncation and error-prone survival outcomes (by Sharon Xiangwen Xie, Professor, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania)

These methods have demonstrated their power and novelty in addressing challenging problems in both statistical theory and real-world applications in data science and statistical learning.

Inspired by the theme of JSM 2024, our session focuses on data contamination, which is essential to every statistician and biostatistician. This session is appropriate and educative for different groups of audience: for people who are new to this area, our session provides overall reviews for the problem and its impact on the decision-making process; for people who are already aware of this issue, we provide the latest statistical advances in various disciplines. We expect that this session will be of interest to broad conference attendees, including researchers from academia and industry, and it will encourage interactions and share ideas across different disciplines. We believe this session emphasizes the critical need for developing an understanding of contaminated data in modern data science. It has the potential to lead to further advances and benefit many fields, such as cancer diagnosis and treatment, climate change detection and attribution, and life science.

Sponsors:

Caucus for Women in Statistics 3
ENAR 1
Section on Statistics in Epidemiology 2

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Large (150-275)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand