Wednesday, Aug 6: 10:30 AM - 12:20 PM
4188
Contributed Papers
Music City Center
Room: CC-101D
Main Sponsor
Section on Statistics and Data Science Education
Presentations
The term "AI" was used loosely to describe computer bots in the past. Now, computers can perform complex tasks, and we must find ways to make a positive impact of using Artificial Intelligence in education. Through a project-based approach, students experience real-world applications of data analysis in Biology using technologies: AI, The R Program, and Adobe Portfolio. Students are guided to use free AI tools (i.e. ChatGPT) to generate biological research questions, simulate random data, and analyze the data using techniques and methods taught in class. With the help of AI tools, students learn to ask ChatGPT how to code their analysis using the (free) R Program and present their results through projects by creating a website using Adobe Portfolio. With measurable learning outcomes for the projects at the end of the semester and opportunities for formative feedback as "check points" throughout each project, students gain a better understanding of the process. This paper will discuss the project-based approach with student examples along with the results of a survey designed to gather student feedback to make enhancements in future semesters.
Keywords
AI
Adobe Portfolio
Statistical Education
The widespread adoption of Large Language Models has made distinguishing between human- and AI-generated essays more challenging. This study explores AI detection methods for domain- and prompt-specific essays within the Diagnostic Assessment and Achievement of College Skills (DAACS) framework, applying both random forest and fine-tuned ModernBERT classifiers. Our approach incorporates pre-chatGPT essays, likely human-generated, alongside synthetic datasets of essays generated and modified by AI. The random forest classifier was trained with open-source embeddings such as miniLM, RoBERTa, and a low-cost OpenAI model, using a one-versus-one strategy. The ModernBERT method employed a novel two-level fine-tuning strategy, incorporating essay-level and sentence-pair classifications that combines global text features with detailed sentence transitions through coherence scoring and style consistency detection. Together, these methods effectively identify whether essays have been altered by AI. Our approach provides a cost-effective solution for specific domains and serves as a robust alternative to generic AI detection tools, all while enabling local execution on consumer-grade hardware.
Keywords
artificial intelligence, machine learning, large language models, text classification, AI detection, ModernBERT, random forest, embedding models, academic integrity
AI ethics, chatGPT, text analysis, coherence detection, style consistency, AI-generated content, synthetic data, DAACS, self-regulated learning, deep learning
essay assessment, diagnostic assessment, educational technology, sentence-pair analysis, miniLM, RoBERTa, OpenAI
Trained as a statistician using R, the shift to teaching Introduction to Data Science courses with Python presents a challenge for instructors who are more familiar with R. AI has proven to be a valuable tool in converting R code into Python, allowing both languages to be seamlessly integrated into the curriculum. By using AI to automatically convert R code into Python, I've been able to seamlessly integrate both languages into my lectures. In this paper, I will demonstrate how to utilized AI for R-to-Python code conversion and show how to incorporate the side-by-side code in Quarto documents alongside Beamer presentations. This approach not only helps to teach both languages effectively but also enhances the learning experience for students by exposing them to multiple programming paradigms in real-time.
Keywords
AI in Education
R to Python Conversion
Data Science Education
Quarto for Teaching
AI-Assisted Code Translation
Students trained in data analytics programs are often judged by future employers on the depth and breadth of their analytical programming knowledge. Exposure to multiple languages, in particular, knowledge of Python, R, SQL, and SAS, will increase the attractiveness of students on the job market. Faculty may wish to focus on teaching one primary language and augment this learning with examples of how to achieve the same result using other languages. Others may have enough instruction time to fully commit to teaching multiple languages. Free cloud software for academics SAS Viya Workbench for Learners (www.sas.com/wfl) provides a Visual Studio interface to write Python, R, SQL, and SAS code to seamlessly teach and learn these languages at the same time within one software. Code from multiple languages can be run within one notebook or organized across many notebooks. Comparison of syntax across languages is easy, setup is minimal, and no installation is required. Git integration is also included. Examples of Python, R, SQL, and SAS code to perform data cleaning and statistical modeling will be shown.
Keywords
software
notebook
Large Language Model (LLM) tools (e.g., ChatGPT) are increasingly helping statistics/data science (DS) courses foster self-efficacy, personalize learning, and make data science accessible to students with less coding training. However, students with inadequate understanding of how LLM tools work may use them counterproductively, thus hindering their learning and problem-solving abilities. To address this, we developed an interactive LLM Literacy curriculum to help students (1) learn LLM fundamentals and then (2) develop best practices for using LLM tools as statistics/DS aids. The modules focus on debugging and statistical design, integrating literature on best practices in these fields with best practices for the use of LLMs as learning aids. The curriculum is tool-agnostic and adaptable to evolving LLM tools. We incorporated the curriculum into a graduate statistics/DS course for biomedical students and found significant improvements in students' LLM prompt-writing practices, ability to solve statistics/DS problems, and confidence in their skills. These findings underscore the importance of LLM literacy training as a necessary part of modern statistics/DS education.
Keywords
large language model
statistics education
data science education
generative AI
statistical literacy
computing
Metabolic alterations in cancer cells are a fundamental characteristic of tumorigenesis. However, limited research has been performed to identify metabolic expression adaptation signatures at the protein level in cancer datasets. In this study, metabolic gene expression datasets from the National Cancer Institute (NCI) Proteomic Data Commons were analyzed to evaluate metabolic protein abundance changes across cancers. In addition, sub-system level metabolic pathway alterations and how they correlate with cancer progression were investigated. Patient metadata, including cancer subtypes, pathological stage, and race/ethnicity, were used to identify features of metabolic protein adaptations driving classifications across tumor subtypes, during cancer progression, and across different patient populations. Gene set enrichment analysis (GSEA) and machine learning approaches were applied to examine protein alterations associated with sub-system metabolic pathways. Understanding metabolic gene expression changes in cancer as the result of metabolic adaptations will enhance our knowledge of cancer biology and highlight functionally important metabolic processes.
Keywords
Metabolic Adaptations
Cancer Metabolism
Proteomic Data Analysis
GSEA
Machine Learning