03 - Comprehensive Analysis of Elastographic Liver Disease Biomarkers and Volatile Organic Compounds (VOCs) with Significant Undetectable Levels in NHANES

Conference: Women in Statistics and Data Science 2022
10/07/2022: 2:30 PM - 4:00 PM CDT
Speed 
Room: Grand Ballroom Salon G 

Description

Nonalcoholic fatty liver disease (NAFLD) is a clinicopathologic diagnosis based on presence of fat with or without inflammation and fibrosis in the liver. NAFLD spans a spectrum of simple steatosis, steatohepatitis, fibrosis, and cirrhosis. Volatile organic compounds (VOC) are mostly manmade chemicals with a high vapor pressure and low water solubility, which are commonly seen in paints, dry cleaning agents, industrial solvents, and even pharmaceuticals, which have been known to contaminate ground water. We proposed to study the association between hepatic steatosis, liver fibrosis and high-risk nonalcoholic steatohepatitis (NASH) and VOC using National Health and Nutrition Examination Survey (NHANES) 2017-18. VOCs are detected in the blood, urine and other body fluids as well. However, the VOC data available in NHANES is based on detection levels in the serum only. Hepatic steatosis was measured using Controlled Attenuation Parameter (CAP) score of the Vibration Controlled Transient Elastography (VCTE) using FibroScan@ , liver fibrosis was measured using Liver Stiffness Measurement (LSM) and high risk NASH was determined using FAST score (calculated using CAP, LSM, and AST).
VOC data present in NHANES pose a few challenges from the perspective of statistical analysis since they have some inherent limitations which include non-normality of data, high number of VOC variables, and majority of values with low VOC detection rates in the serum. With the help of a case study, these analytical issues and remedial measures have been described below.
The case study had three main objectives, i.e., testing associations between a) VCTE measurements and demographic covariates, b) VOCs and demographic covariates and c) VCTE measurements and VOCs in presence of demographic covariates. For the first association test, LSM, CAP and FAST were used as dependent variables and for the second test, VOC was used as the dependent variable. For both sets of tests, the independent variables were age, gender, race, body mass index (BMI), diabetes, and alanine aminotransferase (ALT). For the third test, we investigated association between LSM, CAP and FAST with VOC using all VOCs and covariates as independent variables. Furthermore, the analysis was divided into two phases, i.e., a) traditional and b) non-traditional.
For the traditional analysis, normality tests were performed, and it was found that LSM and FAST had a skewed distribution whereas CAP was normally distributed. Therefore, univariable and multivariable analysis on log transformed LSM and FAST values were conducted. In this analysis, missing VOC values were imputed that resulted in a high number of constant values for VOCs. Thus, log transformation could not solve the issue of non-normality and hence, non-parametric methods were chosen to analyze associations between VOCs and the covariates. Additionally, based on specific cutoffs for LSM (cutoff=8.6 kPa for clinically significant fibrosis (CSF)), CAP (cutoff=286 dB/m for any steatosis) and FAST (cutoff=0.35 for NASH with fibrosis), bivariate analysis was performed. In these bivariate analyses, chi-square tests and t-tests were used to assess associations with categorical and continuous variables respectively.
In the non-traditional analysis, principal components were identified based on 40 VOCs present in the NHANES dataset. Principal component analysis was used because an increase in the dimensionality of a model makes it unreliable. Moreover, a Bayesian kernel regression was fitted to evaluate associations between elastographic liver disease biomarkers and VOCs.
Based on all the analyses described above, an RShiny application is being created that can be used by researchers conducting similar analyses.
Lastly, it would be beneficial to have a guidance from the NHANES regarding analysis of such variables with low frequency of detection as it would help in achieving consistent and generalizable results.

Keywords

NHANES

Liver Disease

Missing Data

Multi-pollutant Models

Bayesian Kernel Machine Regression

Principal Component Analysis 

Presenting Author

Rachana Lele

First Author

Rachana Lele

CoAuthor(s)

Matthew Cave, University of Louisville
Manjiri Kulkarni, University of Louisville
Shesh Rai, University of Louisville
Niharika Samala, Indiana University School of Medicine

Target Audience

Mid-Level

Tracks

Knowledge
Women in Statistics and Data Science 2022