Tuesday, Aug 6: 10:30 AM - 12:20 PM
6031
Contributed Posters
Oregon Convention Center
Room: CC-Hall CD
Main Sponsor
Section on Statistics in Epidemiology
Presentations
Despite being a leading cause of death, the global tuberculosis (TB) burden is ill-defined. Existing methods to estimate incidence are time and/or resource intensive and often imprecise. Backcalculation was developed to estimate HIV incidence by considering reported cases to be a convolution of the disease duration and the incidence of new cases. New estimates of TB natural history parameters allow us to develop Bayesian backcalculation methods for TB to appropriately assign case notification data to the time point of onset of disease. Recorded counts of TB cases are known to be underestimates of the true burden of disease, so we develop a cure model formulation of the TB disease duration distribution to account for underreporting. We assume a Poisson distribution for case counts and incidence and use a penalized likelihood prior to smooth estimates. We estimated TB incidence for Viet Nam, Cambodia, and The Philippines from 1995-2019 via Markov Chain Monte Carlo. Estimated TB incidence in a given year was on average 19% greater than recorded notifications. These estimates require fewer assumptions than existing methods.
Keywords
Bayesian estimation
MCMC
Epidemiology
Biostatistics
Abstracts
Hepatitis A, a highly contagious viral liver infection, is globally widespread. Data on hepatitis A and covariates like average temperature and per capita income are collected across space and time. Consequently, the association between infectious disease outcomes and risk factors may differ across space and time. Some sub-regions may have a heterogeneous association with others, while a homogeneous temporal structure may exist within certain sub-regions. Acknowledging the potential variability in these associations, this study focused on comprehending the spatio-temporal dynamics of hepatitis A through a statistical model.
We analyzed monthly hepatitis A counts in South Korea from January 2020 to December 2021 using a Bayesian spatio-temporal model. Employing a Bayesian spatially-clustered coefficient model with temporal structures, we estimated sub-regions with temporally varying risk effects. Our goal is to use proposed model to uncover insights into the spatio-temporally varying relationships between covariates and hepatitis A outcomes. Additionally, we addressed spatial confounding bias by incorporating a two-stage framework in our analysis.
Keywords
Hepatitis A
spatio-temporal model
spatially-clustered coefficient
spatial confounding bias
Bayesian inference
Abstracts
Background: The Dietary Intervention Study in Children (DISC) clinical trial was designed to shed light on pediatric cardiovascular health's response to interventions over three years.We re-analyzed the original data using advanced statistical methods.Objectives: We aimed to reassess DISC data with linear mixed-effects models and splines, examine treatment effects across subgroup and analyze compliance effect on dietary outcomes.Results: Employing B-splines, effectively minimized the AIC for LDL analysis.The intervention lowered the average LDL by -2.32 (p-value: 0.027).Increased attendance at intervention sessions was significantly associated with a reduction in fat intake (p<0.0001).This effect varied by gender, being less pronounced in girls(p : 0.003).Furthermore, children from families with higher parental education and marital stability were more likely to attend sessions, subsequently influencing dietary outcomes(p-values:0.044, 0.017).Conclusions: Our reanalysis highlights the importance of adherence in pediatric dietary interventions for cardiovascular health. It reveals gender differences and sociodemographic impacts, suggesting tailored dietary strategies are needed.
Keywords
Dietary Intervention
Pediatric Cardiovascular Health
Linear Mixed Effects Models
Spline Models
Compliance Analysis
Abstracts
Immortal time bias is a significant issue when evaluating treatment effectiveness to inform health policy and decision making in observational studies. It is common in time-to-event drug effectiveness analyses where an index date is not clear. For example, when a treatment is being compared to no treatment or a continuation of treatment. In this case, participants are assigned to groups based on data collected after the cohort entry date. Traditional methods to avoid or minimize such bias include landmark analyses with a predefined follow-up start. More recently, clinical trial emulation analyses with the clone-censor-weight approach were proposed. We performed per-protocol trial emulation analyses with and without clones, and a 3-month landmark analysis, on the mortality risk at 9 months comparing intensification vs continuation of antihypertensive treatment in 65,631 eligible patients with chronic kidney disease and high blood pressure using electronic health records from an integrated health system. We found that the results differ between approaches. The results highlight the importance of selecting the proper methods to address immortal time bias.
Keywords
Immortal time bias
Emulating Trials
clones
landmark analyses
Abstracts
Per-protocol analyses of vaccine efficacy trials typically compare event rates between participants assigned to vaccine and placebo among those who adhered to the trial protocol. However, conditioning on adherence introduces the potential for confounding bias because it occurs post-randomization. In this work, we present the goals of per-protocol analyses in vaccine efficacy trials using the Neyman-Rubin causal model. We define three effects: the intention-to-treat effect, the per-protocol cohort effect, and the causal per-protocol effect. We present the correct interpretation of these three effects, and weigh their pros and cons as effects of interest in the analysis of vaccine trials. We then introduce estimators of these three effects, focusing in particular on estimation of the causal per-protocol effect under a no unobserved confounding assumption using Inverse Probability of Treatment Weighting and Longitudinal Targeted Maximum Likelihood Estimation. We use simulation studies to demonstrate how non-adherence, confounding, and effect modification influence when these estimators can be used to make reliable conclusions about the causal effect of protocol adherence.
Keywords
Causal Inference
Per-protocol analyses
Vaccine trials
Inverse probability of treatment weighting
Longitudinal targeted maximum likelihood estimation
Abstracts
We propose a communication-efficient algorithm to estimate the average treatment effect (ATE), when the data are distributed across multiple sites and the number of covariates is possibly much larger than the sample size in each site. Our main idea is to calibrate the estimates of the propensity score and outcome models using some proper surrogate loss functions to approximately attain the desired covariate balancing property. We show that under possible model misspecification, our distributed covariate balancing propensity score estimator (disthdCBPS) can approximate the global estimator, obtained by pooling together the data from multiple sites, at a fast rate. Thus, our estimator remains consistent and asymptotically normal. In addition, when both the propensity score and the outcome models are correctly specified, the proposed estimator attains the semiparametric efficiency bound. We illustrate the empirical performance of the proposed method in both simulation and empirical studies.
Keywords
Causal Inference
High-dimensional Statistics
Double robustness
Distributed inference
Communication efficiency
Likelihood approximation
Abstracts
In medical research, publication bias (PB) poses great challenges to the conclusions from systematic reviews and meta-analyses used in evidence-based medicine. The majority of efforts in research related to classic PB have focused on examining the potential suppression of studies reporting effects close to the null or statistically nonsignificant results. Such suppression is common, particularly when the study outcome concerns the effectiveness of a new intervention. On the other hand, attention has recently been drawn to the so-called inverse publication bias (IPB) within the evidence synthesis community. It can occur when assessing adverse events because researchers may favor evidence showing a similar safety profile regarding an adverse event between a new intervention and a control group. In comparison to the classic PB, IPB is much less recognized in the current literature, and methods designed for classic PB may be inaccurately applied to address IPB, potentially leading to entirely incorrect conclusions. This article aims to provide a collection of accessible methods to assess IPB for adverse events. Specifically, we discuss the relevance and differences between classic PB a
Keywords
Adverse event
inverse publication bias
publication bias
funnel plot
regression test
Abstracts
Neoadjuvant therapy is on the rise to treat HER2-Positive breast cancer. However, there is no study evaluates the trend of neoadjuvant in the past decade.
In this study, we included breast cancer patients ≥18 years diagnosed with stage I to III HER2-positive breast cancer who received chemotherapy, and surgery from the Surveillance, Epidemiology, and End Results Program from 2010 to 2020. Joinpoint models were used to assess trends in neoadjuvant treatment and association of patients' characteristics and neoadjuvant treatment was evaluated by generalized estimating equations.
A total of 59,965 women with median age of 56 years, 60.9% were White, 14.5% were Hispanic, 11.9% were Asian, 11.6% were Black, and 1% were other or unknow race ethnicity. Neoadjuvant chemotherapy was increased from 20.1% to 46.1% between 2010-2020 (p-value <.001). Neoadjuvant increased the most in stage I patients with the average annual percent change (AAPC) of 22 with 95% confidence interval (CI) of (18-26.2) followed by 10.1 (8.5-11.8), and 4.4 (2.0-7.0) for stage II and III respectively. Neoadjuvant therapy for HER2-Positive EBC increases and its survival effects need to be evaluated.
Keywords
trend
joinpoint
breast cancer
Abstracts
First Author
Hui Zhao, University of Texas-MD Anderson Cancer Center
Presenting Author
Hui Zhao, University of Texas-MD Anderson Cancer Center
Multiple imputation (MI) of a variable with multiple categories can be accomplished in several ways. In the Influenza Hospitalization Surveillance Network, influenza type/subtype is a multicategory variable subject to missingness. MI of this variable presents challenges: influenza type/subtype variable is derived from two categorical variables, only one of which has missing data.; additionally, surveillance data are collected by stratified sampling. Imputing influenza type/subtype using a principled method, while accounting for sampling design and achieving compatibility between the MI and the analysis models can be challenging. We explored strategies for imputing missing data for this variable. We used a simulation study to compare the performance of the selected approaches. Although the proportion of observations with missing subtype was high, the missing mechanism was likely missing at random (MAR); thus, we evaluated the benefits of using MI compared to a complete case analysis.
Keywords
Multiple imputation
multicategory variables
Abstracts
Introduction: Few studies have examined performance of the generalized propensity score (GPS) in estimating average treatment effects (ATE) using computer learning methods in high dimensional and nonlinear data. Objective: Use simulation to assess causal inference bias when applying multiple computer learning estimated GPSs in high dimensional and nonlinear data. Methods: A large population was simulated with four covariates associated with a continuous treatment, and a continuous outcome. Extraneous covariates were simulated for total of four dimensionality scenarios. Additionally, treatment associations were simulated in a linear and non-linear fashion. 1000 Monte Carlo datasets were randomly selected and GPS was estimated using multiple linear and computer learning algorithms (including but not limited to random forest, SVM, and deep learning). ATE was assessed for each model type, and compared using bias and absolute percent relative bias from known population effects. Expected Results: Common linear model methods will perform well in linear low dimensional scenarios, computer learning methods will outperform in high dimensionality and nonlinearity.
Keywords
Generalize Propensity Score
Machine Learning
High Dimensional
Non-Linear
Abstracts
Introduction
Colorectal cancer (CRC) had higher cancer-related mortality. Due to the time bias between the cancer diagnosis and initial treatment date, the decreasing mortality after a patient receiving treatment may be exaggerated. This study aimed to explore the impact of immortal time bias on the mortality risk of CRC patients for improving the accuracy of risk prediction post-cancer diagnosis using the real-world databases, Taiwan Cancer Registry.
Methods
This study establishes a mortality risk model for CRC patients from 2012 to 2023. For assessing the immortal time bias of the mortality risk, Cox regression with landmark analysis and time-varying analysis approaches were used to estimate the hazard ratios with 95% confidence intervals.
Results and Conclusion
The time interval between diagnosis and treatment could be estimated using landmark and time-varying analysis approaches for accuracy of mortality prediction. The results indicated the correct design of methodology could elevate the precision of cancer mortality. In conclusion, exploring correlations over the time since diagnosis provides a more comprehensive perspective for improving CRC diagnosis and treatment.
Keywords
Immortal Time Bias
Colorectal Cancer
Taiwan Cancer Registry
Cancer Mortality
Abstracts
Influenza vaccination can attenuate severe disease in hospitalized patients, but vaccination status can be inconsistently captured in the medical record or immunization registry, necessitating provider or patient interview to verify a patient's vaccination status. In the Influenza Hospitalization Surveillance Network (FluSurv-NET), vaccination status was unknown for 15-30% of patients hospitalized with laboratory-confirmed influenza in recent seasons, even after attempting patient interviews. Implementing an imputation procedure for vaccination status could be beneficial, particularly if medical records and registries continue to yield missing vaccination status, interviews yield fewer responses, or self-reported status remains less reliable. We evaluated several individual-level factors available in the medical record that could be used as predictors in a multiple imputation model using 2022-2023 FluSurv-NET data. Race/ethnicity and state of residence were each associated with having a known influenza vaccination status based on data from the medical record, registry, or interviews. Sex, race/ethnicity, the presence of underlying medical conditions, and state were each associated
Keywords
Influenza vaccination
Multiple imputation
Abstracts
Although researchers have developed approaches for estimating high-dimensional genetic influence on cross-sectional data, there has been little work in generalizing these approaches in a mixed model for longitudinal settings. We develop a linear mixed model incorporating two separate genetic effects on the baseline and rate of change for a longitudinal response. Methodological challenges arise from the need to deal with the high-dimensional computation and to account for the crossed nature of the genetic and subject-specific random effects, which induce dependence between longitudinal measurements across all subjects. We propose a modified average information restricted maximum likelihood (AI-ReML) method to obtain the estimation for the variances of these two separate genetic effects. We illustrate our methodology through examining two separate genetic effects integrating approximately 7 million genetic variants on the trajectory of prostate-specific antigen (PSA) level in healthy males from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Our analysis uncovers a substantial genetic influence on the rate of change in PSA level over time.
Keywords
longitudinal data
high dimension
genetic effects
ReML estimation
AI-ReML algorithm
PSA level
Abstracts
For case-control studies that assess risk from expensive-to-measure exposures like dioxin, mixing together ("pooling") biospecimens from sets of participants before assay saves money and conserves biospecimens. Large cohort studies can reduce assay costs through case-cohort analyses, where specimens from a random sample of all cohort participants, called the "subcohort", plus a random sample of cases not in the subcohort are assayed. Use of pooling for case-cohort analyses promises further savings but remains unstudied. This paper develops and evaluates a pooling strategy and a corresponding data-analysis method for using biospecimen pooling for case-cohort analyses in large cohort studies that relate biospecimen-based exposure measures to risk of disease. Our proposed strategy constructs pooling sets for cases not in the subcohort after grouping them according to time of diagnosis, and constructs pooling sets separately for members of the subcohort, grouping them on age at entry. We performed simulation studies to evaluate the performance of our approach for for cohorts and subcohorts of various sizes.
Keywords
pooling
case-cohort study
cohort study
Abstracts
Predictive analytics has been used in children's service. Factors associated with child maltreatment may be multi-dimensional mixed. However, few studies examined children at risk from a socioecological framework. The study aims to develop a predictive model for early detecting children at high risk using diverse linked administrative datasets through machine learning methods. Administrative data from health, human services, police affairs, and education sectors between 2011 and 2018 were retrieved and integrated. A 1:10 matched case-control method with different predictive analytics techniques was used to build a prediction model. There were 4431 children who were reported with risk before their fifth birthday and 40837 controls between 2012 and 2018. In general, the risk model developed in this study had good performance with both precision and recall rate greater than 0.90. The identified risk factors associated with children at risk varied across age groups. The developed risk model can be a decision aid in real practice to help early detect children at high risk.
Keywords
Machine learning
Predictive analytics
Child protection
Abstracts
Estimating the causal treatment effects by subgroups is important in observational studies when the treatment effect heterogeneity may be present. Existing propensity score methods rely on a correctly specified model. Model misspecification results in biased treatment effect estimation and covariate imbalance. We proposed a new algorithm, the propensity score analysis with guaranteed subgroup balance (G-SBPS), to achieve covariate balance in all subgroups. We further incorporated nonparametric kernel regression for the propensity scores and developed a kernelized G-SBPS (kG-SBPS) to improve the subgroup balance of covariate transformations in a rich functional class. This extension is more robust to propensity score model misspecification. Extensive numerical studies showed that G-SBPS and kG-SBPS improve subgroup covariate balance and subgroup treatment effect estimation (ATE), compared to existing methods. We applied G-SBPS and kG-SBPS to a dataset on right heart catheterization to estimate the subgroup ATEs on the hospital length of stay and a dataset on diabetes self-management training to estimate the subgroup ATEs for the treated on the hospitalization rate.
Keywords
Causal inference
Subgroup analysis
Nonparametric kernel regression
Covariate balance
Inverse probability weighting
Treatment effect heterogeneity
Abstracts
Co-Author
Liang Li, University of Texas MD Anderson Cancer Center
First Author
Yan Li, Mayo Clinic
Presenting Author
Yan Li, Mayo Clinic
Potentially preventable hospitalizations (PPH) are used to measure health system performance. During the COVID-19 pandemic, disruptions to healthcare may have hindered disease treatment and increased the risk of PPH. Messaging to stay home when sick, as well as post-infection sequelae may have compounded risks among those with COVID-19. Using an emulated target trial design with monthly sequential trials, we compared risk of a PPH among 189,136 Veterans with COVID-19 between March 1, 2020, and April 30, 2021 and 943,084 matched uninfected comparators. The primary outcome was a first PPH in Veterans Health Administration (VHA) hospitals, or in community hospitals either paid by VHA or Medicare fee-for-service. Extended Cox models were used to examine adjusted hazard ratios (aHRs) of PPH among Veterans with COVID-19 and comparators during varying follow-up periods: 0-30, 0-90, 0-180, and 0-365 days. In total, 3.1% (3.8% of infected and 3.0% of comparators) of Veterans had a PPH during one-year follow-up. The risk of a PPH was greater among Veterans with COVID-19 than comparators in four follow-up periods: 0-30-day aHR=3.26; 0-90-day aHR=2.12; 0-180-day aHR=1.69; 0-395-day aHR=1.44.
Keywords
emulated target trial
preventable hospitalization
extended Cox modelling
SARS-CoV-2
COVID-19
Veterans
Abstracts
Co-Author(s)
MEIKE NIEDERHAUSEN, Oregon Health & Science University
Yumie Takata, College of Health, Oregon State University
Alex Hickok
Mazhgan Rowneki, Department of Veterans Affairs
Holly McCready, Center to Improve Veteran Involvement in Care, VA Portland Health Care System
Valerie Smith, Duke University
Thomas Osborne, VA Palo Alto Health Care System
Edward Boyko, VA Puget Sound Health Care System
George Ioannou, VA Puget Sound Health Care System
Matthew Maciejewski, VA Durham Health Care System
Elizabeth Viglianti, VA Ann Arbor Health Care System
Amy Bohnert, VA Ann Arbor Health Care System
Ann O'Hare, VA Puget Sound Health Care System
Theodore Iwashyna, Johns Hopkins University
Denise Hynes, Center to Improve Veteran Involvement in Care, VA Portland Health Care System
First Author
Diana Govier, Center to Improve Veteran Involvement in Care, VA Portland Health Care System, Portland, OR
Presenting Author
Alex Hickok
Objective: Timely identification of outliers is critical for disease surveillance and public health intervention. However, real-time outbreak detection on live surveillance data is challenging due to issues including thresholding and the handling of secular trends.
Methods: Five anomaly detection methods were applied to monthly syphilis surveillance data for all US counties from 2014-2021. Known syphilis outbreaks were compiled from the Health Alert Network and state health departments. Summary statistics compared known and detected outbreaks for each method.
Results: Methods accounting for both spatial and temporal components of the data outperformed purely spatial or temporal methods. Spatiotemporal methods correctly detected higher percentages of county-months in a known outbreak state and additional county-months as potential outbreaks as compared to temporal and spatial methods separately.
Discussion: While each method had some success in detecting known syphilis outbreaks, all methods have room for improvement. Future extensions include analyzing multiway stratified demographic data to facilitate the identification of outbreaks otherwise masked by population level noise.
Keywords
Anomaly detection
Spatiotemporal models and time series
Scan statistics
Signal processing
Disease surveillance
Outbreak detection
Abstracts
Pathway mediation analysis is widely used to identify biological mechanisms linking environmental exposures with disease outcomes. Previous methods such as HIMA1 and HIMA2 use penalized regression to identify individual omics biomarker as potential mediators among high-dimensional omics data. However, these methods overlook correlated omics biomarkers within pathway and shared omics biomarkers across multiple pathways, failing to identify key pathways.
We proposed a novel method using overlapped group lasso and principal component analysis to assess association of pathways with exposures and outcomes. Joint significance test was applied to identify mediators among overlapped and correlated pathways. Simulations were done based on correlation structure from a real study. Our method demonstrated power of 0.86, and 5-minutes computation time which showed higher power and less computation time than using HIMA1 or HIMA2 for 1000 simulations in detecting three mediation pathways under high-dimensional setting.
Our method offered higher power, and more efficient computing time in detecting mediation pathways than the existing method for both high and low dimensional data.
Keywords
Mediation Analysis
high dimensional omics data
pathway analysis
Abstracts
We determined how family history and/or the number of APOE4 carrier alleles affect the probability of incurring a serious cognitive impairment within the ten years given a person's entry age while accounting for the competing risk of a premature death or drop out. A serious cognitive impairment is defined by a clinical diagnosis of dementia or a mild cognitive impairment as verified by an informant. During annual follow-up a participant can either be in a normal cognitive state or display some transient or permanent cognitive deficit. Given an entry age in the range 75-90 the probability of a serious cognitive impairment within ten years of follow-up is low (range 9-18%) in the presence of no risks but increases by a factor of 1.7 with one risk and 2.4 with two or more risks. This is based on longitudinal data from the BRAiNS (Biologically Resilient Adults in Neurological Studies) cohort which recruited participants having mean entry age 75.2 ± 7.3, 48% with no risks, 37% with 1 risk and 15% with two or more risks and having as much as 31 years of follow-up. These results have implications for recruiting participants to a longitudinal study of cognitive changes in the elderly.
Keywords
cognitive impairment
dementia
mild cognitive impairment
ten year follow-up
family history
APOE4
Abstracts
Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially-correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution, but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends, and term this class of estimators Double Spatial Regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.
Keywords
Bias reduction
Semiparametric regression
Spatial confounding
Gaussian Process
Abstracts
The study of Low Back Pain (LBP) and associated risk factors provides valuable insight into managing and preventing this illness through the implementation of multi-state models (MSMs) on transitions of individuals between distinct LBP states. Data comes from an LBP research consortium that compiled information from several longitudinal studies involving midwestern manufacturing workers. Repeated observations of individuals were made over approximately 6-years to determine changes in their LBP status: (1) LBP lasting for longer than 7 days, (2) LBP requiring medical care, and (3) LBP resulting in lost time from work. Several MSMs are considered to incorporate different combinations of case definitions as states. Particular interest is given to psychosocial risk factors to better understand their association with LBP transition probabilities. The probability of experiencing LBP is tripled on average for individuals who feel a lack of support and satisfaction in their jobs. An R package under development is used to assist with data preparation. The msm package is used for model fitting.
Keywords
Multi-State Model
Low Back Pain
Psychosocial Factors
Longitudinal Data
R
Occupational Health
Abstracts