Print Close

Mental Health Statistics Section Contributed Session 1

Wenzhu Mowrey Chair

Tuesday, Aug 5: 2:00 PM - 3:50 PM
4129
Contributed Papers

Music City Center

Room: CC-212

Causal inference and machine learning in mental health

Main Sponsor

Mental Health Statistics Section

Presentations

A Calibrated Sensitivity Analysis for Weighted Causal Decompositions

Disparities in health or well-being experienced by racial and sexual minority groups can be difficult to study using the traditional exposure-outcome paradigm in causal inference, since potential outcomes in variables such as race or sexual minority status are challenging to interpret. Decomposition analysis addresses this gap by considering causal impacts on a disparity via interventions to other, intervenable exposures that may play a mediating role in the disparity. Moreover, decomposition analyses are conducted in observational settings and require untestable assumptions that rule out unmeasured confounders. Using the marginal sensitivity model, we develop a sensitivity analysis for unobserved confounders in studies of disparities. We use the percentile bootstrap to construct valid confidence intervals for disparities and causal effects on disparities under given levels of confounding under mild conditions. We also explore amplifications that give insight into multiple confounding mechanisms. We illustrate our framework on a study examining disparities in youth suicide rates among sexual minorities using the Adolescent Brain Cognitive Development Study.

Keywords

causal inference

sensitivity analysis

weighting

health disparities

observational studies

causal decomposition analysis

Co-Author(s)

Elina Visoki, Children's Hospital of Philadelphia
Ran Barzilay, Children's Hospital of Philadelphia
Samuel Pimentel, University of California-Berkeley

First Author

Andy Shen, UC Berkeley

Presenting Author

Andy Shen, UC Berkeley

A Machine Learning Framework Using Real-World Data to Unveil Heterogeneity in Schizophrenia Patients

Identifying patient subgroups with distinct clinical profiles can help personalize treatment, address unmet needs, and improve outcomes. To uncover these latent subgroups, we developed a 4-step machine-learning (ML) analytical framework to real-world claims data, including: (1) Automated feature extraction; (2) K-prototype clustering for subgroup identification; (3) XGBoost for risk factor selection; and (4) Advanced visualizations for clinical interpretability. We identified 3 schizophrenia patient subtypes initiating oral olanzapine, each with distinct characteristics, adherence patterns, and treatment outcomes. A high-risk subgroup with poor adherence had severe psychiatric comorbidities, heavier healthcare resource burden, more substance uses yet showed the strongest treatment effectiveness, suggesting a treatment option facilitating better adherence could improve outcomes. In contrast, the older multimorbid patient subgroup with better adherence had limited effectiveness. This study highlights the power of ML-driven analytical framework in uncovering patient heterogeneity using real-world data, providing a guidance for optimizing schizophrenia treatment in clinical practice.

Keywords

machine learning

unsupervised clustering

feature engineering

real-world data

schizophrenia

personalized treatment

Co-Author(s)

Olga Khanikova, Teva Pharmaceuticals
Sangtaeck Lim, Teva Pharmaceuticals
Weihsuan Lo-Ciganic, University of Pittsburg
Marc Tian

First Author

Handing Xie, Teva Pharmaceuticals

Presenting Author

Handing Xie, Teva Pharmaceuticals

WITHDRAWN Durkheim's Social Integration Theory in Black Youth Suicide. A Machine Learning & Neighborhood Study

Among all racial groups in the US, the suicide rate among Black youth has increased the fastest in the past two decades, rising from 3.05 per 100,000 in 2001 to 5.99 per 100,000 in 2020. This alarming trend underscores the urgent need to study and prevent Black youth suicide as a top public health priority. Durkheim's Social Integration Theory posits that individuals are vulnerable to suicide when social integration is either extremely low or excessively high. The theory has been evaluated across various populations using separate measures of marital stability, residential stability, and religiosity in analytical models. However, to our knowledge, it has not yet been examined in the Black youth population. To address this gap, we propose a data-driven approach to develop a composite measure that captures a neighborhood's level of social integration. We then apply this measure to evaluate Durkheim's theory among Black children and youth (ages 10-17.9) with a mental health-related diagnosis between 10/1/2016 and 9/30/2022 in the INSIGHT Clinical Research Network (n=116,757), controlling for suicide attempt risk and protective factors identified through machine learning models.

Keywords

social integration

Black youth

suicide attempts

electronic health records

machine learning

neighborhood effects

Co-Author(s)

Jialin Wu, Weill Cornell Medicine
Samprit Banerjee, Cornell University, Weill Medical College

First Author

Wenna Xi, Weill Cornell Medicine

Identifiability and Inference for Generalized Latent Factor Models

Generalized latent factor analysis not only provides a useful latent embedding approach in statistics and machine learning, but also serves as a widely used tool across various scientific fields, such as psychometrics, econometrics, and social sciences. Ensuring the identifiability of latent factors and the loading matrix is essential for the model's estimability and interpretability, and various identifiability conditions have been employed by practitioners. However, fundamental statistical inference issues for latent factors and factor loadings under commonly used identifiability conditions remain largely unaddressed, especially for correlated factors and/or non-orthogonal loading matrix. In this work, we focus on the maximum likelihood estimation for generalized factor models and establish statistical inference properties under popularly used identifiability conditions. The developed theory is further illustrated through numerical simulations and an application to a personality assessment dataset.

Keywords

Maximum likelihood estimation

Generalized factor model

Limiting distributions

Co-Author

Gongjun Xu, University of Michigan

First Author

Chengyu Cui

Presenting Author

Chengyu Cui

Individualized Treatment Effect on Factorized Multi-Domain Outcomes

Personalized medicine encounters substantial challenge in mental health due to the subjective and diversified nature of the disease symptoms measured through multi-domain outcomes. Relying on a single summary measure for decision-making risks improving one symptom domain at the expense of another, underscoring the need for reliable effect estimation across multiple outcomes and various factors simultaneously. We propose a novel framework for learning individualized treatment effects with item response outcomes. This approach employs factor analysis to extract key disease factors from observed outcomes, leveraging them to construct a distributionally robust learning procedure. By jointly evaluating multi-domain treatment effects, the framework guarantees robust performance across a wide range of clinically relevant outcomes. Our method offers a computationally efficient algorithm with theoretical justification for simultaneously estimating factor loadings and treatment effects. Demonstrated in a randomized clinical trial for Major Depressive Disorder, it exhibits superior generalizability to external outcomes, underscoring its potential for advancing precision psychiatry.

Keywords

Adversarial learning

Distributional robust

Item response data

Latent factor model

Mental disorders

Precision medicine

Co-Author(s)

Molei Liu, Columbia University
Yuanjia Wang, Columbia University

First Author

Wenbo Fei, Columbia University

Presenting Author

Wenbo Fei, Columbia University

Using Structural Equation Modeling and Medicaid Data to Characterize the Hepatitis C Syndemic

Hepatitis C and HIV have drivers that interact to exacerbate each outcome. We used structural equation modeling (SEM) to characterize the hepatitis C and HIV syndemic among Medicaid beneficiaries.
We used CMS data to identify beneficiaries with chronic hepatitis C, defined as having an HCV RNA test code index date from 2016 to 2020 followed by an ICD-10 chronic code ≥1 day after the index date. We included persons aged 18-64 enrolled in Medicaid for ≥12 months before and after the index date not dually enrolled in Medicare. SEM quantified relationships of factors before the index date with HIV diagnosis afterward. Each factor was a continuous construct representing number of overdoses, substance use disorders (SUDs), and mental health disorders (MHDs). The model allowed for correlation between constructs to estimate odds ratios (ORs), controlling for age, sex, and state.
A total of 467,340 beneficiaries with chronic hepatitis C were included. Each construct was significantly associated with HIV: MHDs (OR= 1.11), overdoses (OR=1.14), and SUDs (OR=1.29). Future modeling will include beneficiaries without hepatitis C and social latent factors to better characterize the syndemic.

Keywords

structural equation modeling

factor model

hepatitis C

syndemic

Medicaid

Co-Author(s)

Michelle Van Handel, Office of the Director, National Center for HIV, Viral Hepatitis, STD, and Tuberculosis Prevention
Hasan Symum
William Thompson, Center for Disease Control & Prevention
Taiwo Abimbola, Office of the Director, National Center for HIV, Viral Hepatitis, STD, and Tuberculosis Prevention

First Author

Angela Estadt, CDC

Presenting Author

Angela Estadt, CDC