AI & Statistical Approaches to Data Fusion

Piyushimita Thakuriah Chair
Rutgers University, New Brunswick, New Jersey
 
Joseph Hogan Panelist
Brown University
 
Irina Gaynanova Panelist
University of Michigan
 
Andrew Gettelman Panelist
Pacific Northwest National Laboratory
 
Brittany Segundo Organizer
The National Academies of Sciences, Engineering, and Medicine
 
Lance Waller Organizer
Emory University
 
Elizabeth Stuart Organizer
Johns Hopkins University, Bloomberg School of Public Health
 
Monday, Aug 4: 8:30 AM - 10:20 AM
0696 
Topic-Contributed Panel Session 
Music City Center 
Room: CC-104C 
Information about complex systems of interest – such as weather or climate, personal health, autonomous vehicles, robots, health service systems, even the human body – can be obtained from different types of sensors and instruments, as well as measurement methods and models. There is a need to utilize information about the same object or phenomenon from myriad datasets and modeled information, but the increasing heterogeneity of data types that provide information about the same system – e.g., text, video, audio, images – adds to the challenge in integrating information for prediction and inference. In many cases the data also needs to be combined with scientific models that provide information about the structure of the system.
"Multimodal data fusion", or the fusion of heterogeneous sources of data and modeled information for a unified and global view of the system from multiple modalities is thus important, but also challenging. While there is no single definition of "data fusion" ("data integration"), the underlying idea is to develop estimates and principled measures of uncertainty based on multiple sources of data and modeled information. Multimodal data are a collection of information from diverse sensors, surveys, and measuring systems that capture complementary views of entities and events under study.
This panel will discuss some of these opportunities and challenges, with a focus on the role of statistics and statistical thinking in these contexts. The speakers represent a range of areas of expertise and application areas, including public health, environmental applications, and earth systems. The discussion will touch on existing methodological approaches that have been widely used for multimodal data (and model) fusion, including Hidden Markov Models, Bayesian Belief Networks, Similarity Network Fusion, heterogeneous ensembles, to name just a few. The increasing size and heterogeneity of data has sparked interest in leveraging artificial intelligence methods, including deep learning-based approaches such as Deep Belief Net-based or Stacked Autoencoder-based multimodal data fusion techniques, among others.
The panel will explore the range of barriers across domains, as well as potential solutions to these problems, and highlight statistical questions that arise within a landscape that is continuing to evolve, including some attention to the pros and cons of different approaches to multimodal data (e.g., model-based as compared to deep-learning based). Panelists will probe the assumptions made under different methods and highlight contexts where these techniques might be appropriate. The panel will discuss state-of-the-art and best practices, drawing from their deep expertise. This conversation is particularly timely, given the heterogeneity and size of today's data, as well as the demand for combining information across fields ranging from climate to cancer to national defense.

Applied

Yes

Main Sponsor

International Statistical Institute

Co Sponsors

ENAR
Social Statistics Section