New Advances in Functional Data Analysis

Yin Tang Chair
 
Sunday, Aug 4: 2:00 PM - 3:50 PM
5009 
Contributed Papers 
Oregon Convention Center 
Room: CC-G129 

Main Sponsor

Section on Nonparametric Statistics

Presentations

A Landmark Competing Risk Model for Dynamic Prediction

Electronic health records (EHRs) are promising but challenging resources for research on investigating and monitoring disease progression. Motivated by the hospitalized COVID-19 patient data from West Bengal in India, we aim to dynamically predict the chance of "discharge" or "death" of these COVID-19 hospitalized patients based on their longitudinal laboratory measurements. In total, there are 147,805 hospitalized COVID-19 patients with 1,091,322 laboratory measurements, and the high volume of this data raises the computation challenge for dynamic prediction. In addition, the features of EHRs data such as sparsity, irregularity and non-linearity also place a challenge in modelling. To address these, we propose a two-step landmark competing risk model which summarizes the historical laboratory measurements using a functional principle analysis (PCA) and then uses the landmark competing risk model for prediction. The proposed method is easy to implement using the existing software. All estimated model parameters, longitudinal history, and at-risk population vary over the landmark time. The whole dataset was randomly split into training and testing set with the ratio of 1:1. Different approaches for handling longitudinal observations including baseline measure, mean, recent measure (last value carry forward), and linear regression are adopted in the two stage estimation and compare with the proposed method via the weighted Harrell's C-Index and Brier score. The proposed method outperforms all comparable methods at the distant landmark time. Using the proposed model we dynamically predict "death" or "discharge" given the different landmark time and depict their associations with COVID-19 medication according to their historical laboratory measurements, which provide the evidence that this model has potential to assist clinicians in understanding patients' disease progression at different time and providing the suggestion about the medication use based on their historical information. 

Keywords

landmark model

competing risk

dynamic prediction

longitudinal data

survival 

View Abstract 2846

Co-Author(s)

Jiajia Zhang, University of South Carolina
Wenbin Lu, North Carolina State University
Pulak Ghosh, Indian Institute of Management, Bangalor

First Author

Ruilie Cai

Presenting Author

Ruilie Cai

Adaptive Leveraged Causal Inference with an Application to PCR Testing

Causal inference methods play a pivotal role in elucidating the effects of interventions and treatments in various domains including healthcare. This research proposes a novel framework that integrates double machine learning and targeted minimum loss-based estimation with Gaussian process regression to estimate treatment effects. The approach dynamically selects inducing points and model parameters based on the complexity of the data and the estimated treatment effects. We illustrate the application of our framework in the domain of medical testing where accurate estimation of treatment effects is crucial for assessing the efficacy of diagnostic tests and medical interventions. Through simulations and real-world data, we demonstrate the effectiveness of our adaptive approach in providing efficient estimates of treatment effects and improving decision-making. The research contributes to advancing the field of causal inference by introducing an adaptive approach that dynamically adjusts to the data characteristics, thereby addressing complex challenges in medical testing and intervention evaluation. 

Keywords

Causal inference

Artificial Intelligence (AI)

Double Machine Learning (DML)

Treatment effects

Adaptive

Inducing points 

View Abstract 3343

Co-Author

Han Yu, University of Northern Colorado

First Author

Felix Junior Appiah Kubi

Presenting Author

Felix Junior Appiah Kubi

Advancing Model Selection in Generalized Functional Regression Through Multivariate Functional PCA

In the realm of generalized functional regression, interpreting results from multivariate functional principal component analysis (MFPCA) applied to diverse, multi-dimensional functional data can be complex. This study introduces an advanced model selection technique that leverages a forward selection approach in MFPCA Here, functional variables are incrementally integrated, with their inclusion in the model being determined by a user-selected criterion. This method is adaptable to sparse data or data plagued with measurement errors. We benchmark the effectiveness of this novel approach against existing methods. A key application of this methodology is demonstrated in a study of neonate metabolites, with the goal of understanding the relationship between longitudinal trajectories and a binary morbidity outcome. This research marks a significant step forward in refining model selection strategies within generalized functional regression frameworks using MFPCA. 

Keywords

Functional principal component analysis

Model selection

Generalized functional regression

Longitudinal data

Multivariate functional principal component analysis

Forward selection 

View Abstract 2307

Co-Author

Hyunkeun Cho

First Author

James Merchant

Presenting Author

James Merchant

Are Users Susceptible to Interaction with an Automated Account Different From Those That Do Not?

This work studies a year of posting behavior of social media users interacting with bot accounts and how their behavior differs from that of users that do not interact with bots. The posting behavior is described by a combination of the user's weekly number of posts, words, and ats. We propose a flexible functional regression model model for the posting behavior of users to not only provide a framework to describe and interpret how susceptible accounts differ from those which are not, but also assess if there is evidence that a new user, whose posting behavior has been observed repeatedly, is susceptible to bot interaction. The proposed methodology is investigated in finite samples through simulations, including scenarios that mimic the data application. 

Keywords

Functional Data Analysis

Social Media

Testing

Social Bot Interaction

Posting Behavior 

View Abstract 3740

Co-Author(s)

Ana-Maria Staicu, North Carolina State University
William Rand, North Carolina State University
Zakaria Babutsidze, SKEMA Business School

First Author

Jake Koerner

Presenting Author

Jake Koerner

Functional Data Models for Dose Finding Studies

The primary aim of dose-finding studies is to pinpoint the optimal dose level based on subjects' responses, focusing on 'Efficacy' and 'Toxicity.' The optimal dose is identified at the point of maximum probability, where efficacy is significant without toxicity. While some studies use Emax, quadratic, or non-linear models, they are unsuitable for non-monotonic curves. Cripper & Orsini (2016) proposed regression splines, but they may not sufficiently describe reasonable dose-response distributions. This paper introduces functional data models for dose-finding studies, presenting a novel approach by applying them to meta-analysis data. We focus on three outcome probabilities: P(Efficacy), P(Toxicity), and P(Efficacy but No Toxicity), guided by monotonic and unimodal assumptions. Our functional data models estimate these probability distributions and introduce adjusted confidence intervals. Finally, we apply these models to analyze data on alcohol consumption and colorectal cancer. 

Keywords

Functional Data

Dose Study

Meta-Analysis Data

Efficacy and Toxicity

Functional Anova

Smoothing methods 

View Abstract 3420

Co-Author(s)

Justin Petrovich, Saint Vincent College
Sungwook Kim

First Author

Bahaeddine Taoufik, Saint Joseph's University

Presenting Author

Bahaeddine Taoufik, Saint Joseph's University

Hypothesis testing for a functional parameter via sample splitting

For testing hypothesis on a multi-dimensional parameter associated with a time series, the self-normalization (SN) method avoids the bandwidth choice and is asymptotically distribution-free under the null. So far the literature has not provided a way of using SN for the inference of an infinite dimensional parameter. In this talk, I will propose a SN-based inference method for a functional parameter via the idea of sample splitting. The proposed statistic avoid the bandwidth choice, and are asymptotically distribution-free. Our method has wide applicability and can be used for many time series testing problems when an infinite dimensional parameter is of main interest. Through simulations, we examine their finite sample performance in comparison with some existing methods, and show that the proposed methods typically leads to more accurate size with mild loss of power. 

Keywords

Time Series

Infinite Dimensional Parameter

Sample Splitting

Inference 

View Abstract 2151

Co-Author

Xiaofeng Shao, University of Illinois, Urbana-Champaign

First Author

Yi Zhang

Presenting Author

Yi Zhang

Optimal Classification of Multivariate Functional Data using Deep Neural Networks

We propose a new method, which we call Multivariate Functional Deep Neural Network (MFDNN), for classifying multivariate functional data across diverse domains. In contrast to existing approaches limited to Gaussian settings and uniform dimensional domains, MFDNN accommodates non-Gaussian data functions on varying dimensional domains (e.g., functions and images). The proposed classifier attains minimax optimality, substantiated by theoretical justifications. Demonstrations on simulated and real-world datasets underscore the versatility and efficacy of MFDNN. This approach complements recent advancements and extends previous results by exploring deep neural network procedures on multivariate functional data across different domains. Comparisons highlight the favorable performance of our method. 

Keywords

Functional data

Deep neural network

Classification

Multivariate functional data 

View Abstract 3673

Co-Author(s)

Nedret Billor, Auburn University
roberto molinari, Auburn University

First Author

Ukamaka Nnyaba

Presenting Author

Ukamaka Nnyaba