Print Close

AI & Statistical Approaches to Data Fusion

Piyushimita Thakuriah Chair
Rutgers University, New Brunswick, New Jersey

Joseph Hogan Panelist
Brown University

Irina Gaynanova Panelist
University of Michigan

Andrew Gettelman Panelist
Pacific Northwest National Laboratory

Brittany Segundo Organizer
The National Academies of Sciences, Engineering, and Medicine

Lance Waller Organizer
Emory University

Elizabeth Stuart Organizer
Johns Hopkins University, Bloomberg School of Public Health

Monday, Aug 4: 8:30 AM - 10:20 AM
0696
Topic-Contributed Panel Session

Music City Center

Room: CC-104C

Information about complex systems of interest – such as weather or climate, personal health, autonomous vehicles, robots, health service systems, even the human body – can be obtained from different types of sensors and instruments, as well as measurement methods and models. There is a need to utilize information about the same object or phenomenon from myriad datasets and modeled information, but the increasing heterogeneity of data types that provide information about the same system – e.g., text, video, audio, images – adds to the challenge in integrating information for prediction and inference. In many cases the data also needs to be combined with scientific models that provide information about the structure of the system.
"Multimodal data fusion", or the fusion of heterogeneous sources of data and modeled information for a unified and global view of the system from multiple modalities is thus important, but also challenging. While there is no single definition of "data fusion" ("data integration"), the underlying idea is to develop estimates and principled measures of uncertainty based on multiple sources of data and modeled information. Multimodal data are a collection of information from diverse sensors, surveys, and measuring systems that capture complementary views of entities and events under study.
This panel will discuss some of these opportunities and challenges, with a focus on the role of statistics and statistical thinking in these contexts. The speakers represent a range of areas of expertise and application areas, including public health, environmental applications, and earth systems. The discussion will touch on existing methodological approaches that have been widely used for multimodal data (and model) fusion, including Hidden Markov Models, Bayesian Belief Networks, Similarity Network Fusion, heterogeneous ensembles, to name just a few. The increasing size and heterogeneity of data has sparked interest in leveraging artificial intelligence methods, including deep learning-based approaches such as Deep Belief Net-based or Stacked Autoencoder-based multimodal data fusion techniques, among others.
The panel will explore the range of barriers across domains, as well as potential solutions to these problems, and highlight statistical questions that arise within a landscape that is continuing to evolve, including some attention to the pros and cons of different approaches to multimodal data (e.g., model-based as compared to deep-learning based). Panelists will probe the assumptions made under different methods and highlight contexts where these techniques might be appropriate. The panel will discuss state-of-the-art and best practices, drawing from their deep expertise. This conversation is particularly timely, given the heterogeneity and size of today's data, as well as the demand for combining information across fields ranging from climate to cancer to national defense.

Applied

Yes

Main Sponsor

International Statistical Institute

Co Sponsors

ENAR

Social Statistics Section