Tuesday, Aug 5: 10:30 AM - 12:20 PM
0639
Topic-Contributed Paper Session
Music City Center
Room: CC-102A
Functional data analysis, CGM, Diagnostics, Online risk monitoring, fMRI
Applied
Yes
Main Sponsor
Section on Medical Devices and Diagnostics
Presentations
Improving healthcare quality relies heavily on accurately evaluating and reducing misdiagnosis-related risk. Traditionally, these efforts have centered on the chart review process, which is often hindered by incomplete documentation, low inter-rater reliability, and hindsight bias. To better assess diagnostic performance and highlight areas for improvement, researchers have suggested leveraging electronic health records (EHRs) within the Symptom-Disease Pair Analysis of Diagnostic Error (SPADE) framework. However, relying solely on internal EHRs introduces bias, as it overlooks cross-over events when patients seek follow-up care outside the hospital of the initial visit. Additionally, the low incidence of many diseases, such as stroke, increases uncertainty in assessing misdiagnosis risk. To address these issues, we propose a Dirichlet-Multinomial mixture model, through regression, to estimate the distribution of misdiagnosis-related harm across hospitals and to predict misdiagnosis probabilities. Our model further enables the examination of covariates that may influence misdiagnosis-related risk, providing hospitals with actionable insights to reduce this risk. We evaluate our approach using simulation studies and apply it to dizziness-stroke occurrence data from the Healthcare Cost and Utilization Project (HCUP). Through these analyses, we assess misdiagnosis risk across 216 hospitals in the dataset and identify relationships between risk of harm and hospital characteristics, such as neurological examination coverage and symptom-related patient volume.
Keywords
Misdiagnosis-related harm
Mixture model
Electronic health records
Health care improvement
Functional principal component analysis (FPCA) is a key tool in the study of functional data, driving both exploratory analyses and feature construction for use in formal modeling and testing procedures. However, existing methods for FPCA do not apply when functional observations are censored, e.g., the measurement instrument only supports recordings within a pre-specified interval, thereby truncating values outside of the range to the nearest boundary. A naïve application of existing methods, without correction for censoring, introduces bias. We extend the FPCA framework to accommodate noisy, and potentially sparse, censored functional data. Local log-likelihood maximization is used to recover smooth mean and covariance surface estimates that are representative of the latent process's mean and covariance functions. The covariance smoothing procedure yields a positive semi-definite covariance surface, computed without the need to retroactively remove negative eigenvalues in the covariance operator decomposition. Additionally, we construct an FPC score predictor, conditional on the censored functional data, and demonstrate its use in the generalized functional linear model. Convergence rates for the proposed estimators are established. In simulation experiments, the proposed method yields lower bias and better predictive performance than existing alternatives. We illustrate its practical value through an application to a study aimed at using censored functional blood glucose data to predict eating disorder diagnoses in type 1 diabetic individuals.
Keywords
Functional principal component analysis
Scalar-on-function regression
Censored functional data
Censored predictors
Disease early detection and prevention (DEDP) is an important topic in medical and public health research. Because disease risk factors are usually observed sequentially over time, DEDP is a sequential decision-making problem and statistical process control (SPC) charts turn out to be a powerful tool after the major complexities of the observed data (e.g., time-varying distributions and serial correlation) are properly addressed. In the literature, several SPC charts have been developed for solving the DEDP problem, but they are designed for detecting a single disease. In practice, however, we are often concerned about multiple diseases (e.g., different cardiovascular diseases), and there are no existing SPC methods designed for detecting multiple diseases yet due to its complexity. In this paper, a new dynamic screening system (DySS) is proposed for detecting multiple diseases. The new method first quantifies a patient's risk to each disease in concern at the current observation time, and then compares the quantified risk pattern with the regular risk pattern of non-diseased people that is estimated from a training dataset by a flexible longitudinal data modelling approach. The cumulative difference between the two risk patterns by the current observation time is used for determining whether a given patient has any of the multiple diseases in concern. Numerical studies show that the proposed method works well in different scenarios.
Keywords
Dynamic screening system
Multiple diseases
Multivariate longitudinal data
Online process monitoring
Single-index model
Data acquisition in a functional Magnetic Resonance Imaging (fMRI) activation detection experiment yields a massively structured array- or tensor-variate dataset that needs to be analyzed with respect to a set of time-varying stimuli and possibly other covariates. The conventional approach employs a two-stage analysis: The first stage fits an univariate regression on the time series data at each individual voxel and reduces the voxel-wise data to a single statistic. The statistical parametric map formed from these voxel-wise test statistics is then fed into a second-stage analysis that potentially incorporates spatial context between the voxels and identifies activation within them. We develop a holistic yet practical tensor-variate methodology that provides one-stage tensor-variate regression modeling of the entire time series array-variate dataset. Low-rank specifications on the tensor-variate regression parameters and Kronecker separable error covariance tensors make our innovation feasible. A block relaxation algorithm provides maximum likelihood estimates of the model parameters. An R package, with C backends for computational feasibility, operationalizes our methods. Performance on different real-data-imitating simulation studies and a functional MRI study about Major Depressive Disorder demonstrate the stability of our approach and that it can reliably identify cerebral regions that are significantly activated.
Keywords
functional MRI
Kronecker separable models
tensor decomposition
tensor variate statistics
Accurate classifiers that utilize novel biomarkers and readily available predictors significantly enhance decision-making in various clinical scenarios, such as in assessing the need for biopsies in cancer diagnosis. When classification performance is limited, a decision framework can be applied to effectively rule in or rule out diagnoses while incorporating a neutral zone for indeterminate classifications. Building on this framework, we propose a new family of two-step classifiers that selectively employ costly biomarker testing for a targeted subset of individuals undergoing multiple evaluations. This optimal solution expands upon the Neyman-Pearson Lemma, highlighting a vital trade-off between the costs of expensive biomarker measurements and the improvement of classification performance while minimizing uncertainty in the decision process. We demonstrate the practical utility of our approach through a biomarker study focused on prostate cancer diagnosis.
Keywords
biomarker
classification
sequential testing