05/01/2025: 3:45 PM  - 5:15 PM  MDT
   
              
               Refereed 
               
   
   
   
   
    Room: Alpine East 
   
      
      
                         
Chair
                         
                Sunghwan Byun, North Carolina State University 
                  
               
      
   
                  Target Audience
                  
               Mid-Level
               
                  Tracks
                  
               Practice and Applications
               
               Software & Data Science Technologies
               
               Statistical Data Science
               
            Symposium on Data Science and Statistics (SDSS) 2025
         Presentations
 
            
            
            
          
            
           
            
              Advances in AI and automation are reshaping educational workflows, making processes more efficient, accurate, and equitable. In collaboration with the Illinois State Board of Education (ISBE), the American Institutes for Research (AIR) has modernized traditional school needs assessments processes. These assessments, once requiring months of manual effort, now benefit from streamlined workflows that reduce timelines to mere weeks while improving the level of customization and actionable insights available to schools.
Central to this transformation is the AI Findings Pipeline, which leverages WhisperX and GPT to automate transcription, speaker identification, and the tagging of focus group audio. This tool enables the rapid generation of AI generated insights, converting hours of discussions into minutes of customized findings. Alongside this innovation is the Report Running Data Pipeline, a system that integrates Airtable automations with AWS technologies to produce tailored, data-rich reports on demand. These reports combine AI-generated findings with critical school metrics and survey data, offering a holistic view of school performance and critical needs.
Together, these tools provide researchers and school leaders with timely, evidence-based recommendations, supporting ISBE's Equity Journey Continuum and broader equity goals. By integrating structured data with qualitative expertise, this approach highlights the potential of AI to simplify complex processes, minimize logistical burdens, and drive impactful improvements in educational systems. This methodology underscores the growing importance of leveraging technology to meet the evolving challenges of education while maintaining a focus on equity and inclusivity. 
         
              
           
           
        
                           
Presenting Author
                           
                  Graham Chickering, American Institutes for Research 
                   
                     
                  
                           First Author
                           
                  Graham Chickering, American Institutes for Research 
                   
                     
                  
                           CoAuthor(s)
                           
                  Christina Jones, American Institutes for Research 
                   
                     
                  
                  Collin Heckman, American Institutes for Research 
                   
                     
                  
           
           
           
           
           
           
        
            
            
            
          
            
           
            
              The increasing adoption of artificial intelligence (AI) across regulatory and healthcare domains highlights its transformative potential in addressing critical public health challenges. The U.S. Food and Drug Administration (FDA) has identified adverse drug event (ADE) detection as a priority area for innovation, as outlined in its strategic initiatives. Timely and accurate identification of ADEs is critical for ensuring patient safety and informing regulatory decisions. However, leveraging the FDA Adverse Event Reporting System (FAERS) for ADE detection remains fraught with challenges, including data heterogeneity, reporting inconsistencies, and scalability issues.
Recent advances in generative AI, machine learning (ML), and large language models (LLMs) offer a promising path forward. A recent study demonstrated the efficacy of fine-tuned LLMs, such as GPT-3.5, in analyzing detailed vaccine adverse event reports in the Vaccine Adverse Event Reporting System (VAERS) (Li et al., 2024). Using 91 annotated reports, the authors developed AE-GPT, a tool for automatically extracting and categorizing adverse events, setting a new benchmark in ADE detection. 
Our research builds on this precedent, aiming to enhance ADE detection by fine-tuning LLMs for FAERS datasets. FAERS contains millions of masked case reports spanning 2004 to 2024, with data fields including demographic, administrative, drug, reaction, and patient outcome information. We use embeddings from LLMs to classify case severity and identify features predictive of severity, providing a multi-strata classification scheme for ADE detection. We use logistic regression as a baseline and compare the results to standard ML models including a Random Forest classifier, DB Scan, and XGBoost. Our framework achieved notable results demonstrating the potential of LLMs in processing complex medical data and highlight the ability to enhance early ADE detection. 
         
              
           
           
        
                           
Presenting Author
                           
                  John Riddles, Westat 
                   
                     
                  
                           First Author
                           
                  Joshua Turner, Westat 
                   
                     
                  
                           CoAuthor(s)
                           
                  John Riddles, Westat 
                   
                     
                  
                  Julianna Lee, Westat 
                   
                     
                  
                  Jeremy Corry, Westat 
                   
                     
                  
                  Rashi Saluja 
                   
                     
                  
                  Sean Chickery, Westat 
                   
                     
                  
                  Gizem Korkmaz, Westat 
                   
                     
                  
                  Marcelo Simas, Westat 
                   
                     
                  
                  Kevin Wilson, Westat 
                   
                     
                  
           
           
           
           
           
           
        
            
            
            
          
            
           
            
              Triple-negative breast cancer (TNBC) has a higher recurrence rate and poorer overall mortality than other molecular subtypes in U.S. Studies have shown that African American (AA) women are genetically more likely to develop advanced TNBC than Caucasian American (CA) women. In Louisiana (LA), there were 3,790 TNBC cases from 2010 to 2017, of which 1,861 (49.1%) were AA versus 1,900 (50.1%) were CA. However, 32.8% of the LA population were AA and 62.8% were CA. Notably, 43.5% of the AA patients were diagnosed with regional or distant metastasis, compared with 36.6% of CA patients. Thus, TNBC diagnosis stage represents a significant challenge to racial health disparities in LA.
Our research is based on data collected by the Louisiana Tumor Registry (LTR) from 2010-2017. In addition to the routinely collected standard data, LTR connected related variables with U.S. census tract level environmental factors from National Scale Air Toxics Assessment (NATA) along with the environmental justice indices (EJI). A total of 3,225 adult female TNBC patients were included in the dataset. Among them, 1,675 (51.9%) were AA and 1,550 (48.1%) were CA. We used the Bayesian mediation analysis method to identify environmental risk factors and quantify their effects that explain the racial disparities in stage at diagnosis among TNBC patients in Louisiana.
There is significant association between race and stage at diagnosis (p-value < 0.001). The disparity was partially explained using the collected mediators. The significant mediators included patient's age at diagnosis (25.89%), insurance (4.71%), poverty index (26.16%) and environmental chemical Naphthalene (8.38%).
In LA, a high proportion of Black residents live in cancer ally. This exposes them to high toxic emission that contains carcinogens like Naphthalene. Early diagnosis, improving access to health insurance, reducing poverty-related barriers, reducing exposure to Naphthalene can help with early detection of TNBC. 
         
              
           
           
        
                           
Presenting Author
                           
                  Nubaira Rizvi, LSU-Health New Orleans 
                   
                     
                  
                           First Author
                           
                  Nubaira Rizvi, LSU-Health New Orleans 
                   
                     
                  
                           CoAuthor(s)
                           
                  Xiao-Cheng Wu, Louisiana Tumor Registry 
                   
                     
                  
                  Bin Li, Louisiana State University 
                   
                     
                  
                  Qingzhao Yu