Wednesday, Aug 6: 10:30 AM - 12:20 PM
4174
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Section on Statistics and Data Science Education
Presentations
Central limit theorems (CLTs) have a long history in probability and statistics. They play a fundamental role in constructing valid statistical inference procedures. Over the last century, various techniques have been developed in probability and statistics to prove CLTs under a variety of assumptions on random variables. Quantitative versions of CLTs (e.g., Berry–Esseen bounds) have also been parallelly developed. In this article, we propose to use approximation theory from functional analysis to derive explicit bounds on the difference between expectations of functions. We provide bounds on the difference between functions of random variables using level sets of functions. Using classical uniform and non-uniform Berry–Esseen bounds for univariate random variables. The resulting bounds can be applied to single-layer neural networks and functions on [-1,1]^d with finite weighted norm integrable Fourier transform. These functions belong to the functions in Barron space. Unlike the classical bounds that depend on the oscillation function of f, our bounds do not have an explicit dimension dependence.
Keywords
multidimensional
central limit theorem
Berry-Esseen bound
dependence on dimension
dependence on function
K-12 teachers from Oregon are encouraged to make use of a set of science education tools and model lessons created by the research team the Language, Culture, and Knowledge-building through Science (LaCuKnoS) project -- an NSF-awarded project at Oregon State University. We developed engaging data visualization and AI application class modules, utilizing LaCuKnoS tools such as language booster, concept cards, and other interactive learning aids. These modules aim to provide students with a understanding of essential concepts in data science, making complex topics more accessible and enjoyable. To assess the impact of these activities on students' learning outcomes, surveys are administered twice each academic year, measuring the improvement in both the students' understanding of STEM concepts and their interest in pursuing STEM-related fields. The analysis focuses on the development of students' STEM and how their participation in the program influences their career preferences. With various statistical tools, we implemented a system for evaluating the conceptual understanding of STEM materials of K-12 students in the LaCuKnoS project.
Keywords
data science
K-12
STEM
education
AI
Games can offer an engaging and low-stakes method for students to review and reinforce their learning. I present the use of a cooperative, fantasy-themed board game designed to help students solidify concepts covered in a second course in statistics. By leveraging the creative writing skills and assistance of a generative AI model, a compelling narrative and game mechanisms were developed to immerse students in a fun classroom experience.
I will detail the process of generating the game's story, designing its mechanisms, and creating the accompanying graphics. Additionally, feedback from students who participated in the game will be shared, highlighting the effectiveness and enjoyment of this educational approach. The feedback will also include suggestions for alternative game mechanisms, improvements to the game, and ideas for different themes and settings.
Keywords
Statistics Education
Game-Based Learning
As Large Language Models (LLMs) become integral to education, disparities in access to high-quality AI tools raise concerns about their impact on the education gap. This study examines the differences between free and paid LLMs in terms of accessibility, performance, and effectiveness in educational settings. By analyzing model capabilities, resource availability, and student outcomes, we assess whether free models provide equitable learning opportunities or if paid versions create an advantage for those with financial means. Our findings offer insights into the role of LLMs in shaping the future of education and the potential need for policy interventions to ensure fair access.
Keywords
AI in education
Large Language Models
digital divide
Exploring the nature of how students learn Statistics and how instructors can most effectively help them has been a focal point in statistics education research over the past few decades (Carver et al., 2016). While earlier studies focused on different teaching approaches (e.g., Simon et al., 1976; Federer, 1978), cognitive challenges and misconceptions (e.g., Brewer, 1985; Garfield and Ahlgren, 1988), and students' attitudes (e.g., Pavlick, 1975; Gal and Ginsburg, 1994), recent research has shifted toward understanding the motivational aspects of learning statistics, e.g. interest (e.g., Sproesser, 2016), self-efficacy (e.g., Finney and Schraw, 2003), and intrinsic motivation (e.g., Dun, 2014). We aim to explore curiosity as part of intrinsic motivation, recognizing its potential to enhance students' learning (Pluck and Johnson, 2011).
Curiosity–the desire to acquire knowledge–is integral to learning environments that actively engage students when teachers can use specific techniques to evoke curiosity, enriching the learning atmosphere (Schmitt and Lahroodi, 2008). One of the initial focuses of this cross-institutional collaboration is to see whether we can measure curiosity
Keywords
Curiosity
Statistics Education
Intrinsic Motivation
Learning
Student Engagement
Teaching Environment
Lung and colon cancers are leading causes of mortality worldwide, with variations across healthcare systems. This study uses multivariate time series modeling to analyze lung and colon cancer mortality trends in Jamaica and the U.S. from 1960 to 2014, applying Vector Autoregressive Moving Average (VARMA) models to assess interdependence. Country-specific multivariate forecasts extend 12 years beyond 2014, identifying disparities, similarities, and influencing factors. Model selection and validation use statistical metrics like MAPE, RMSE, and AIC to ensure accuracy. Monte Carlo simulations enhance predictive robustness by accounting for future variability. This research provides data-driven insights into cancer mortality trends, contributing to the development of advanced statistical models for understanding and forecasting cancer outcomes. Findings will support public health planning and policy development in both regions.
Keywords
Cancer Mortality
Time Series Analysis, VARMA, Multivariate Forecasting
Monte Carlo Simulation, Predictive Analytics
Public Health
Geographic Analysis: Jamaica, United States
This research introduces a novel two-stage cluster randomized design, the order restricted cluster randomized block design (ORCRBD). The ORCRBD builds upon the cluster randomized block design by incorporating a second layer of blocking, achieved through ranking cluster units that are randomly sampled from the population. This approach creates a two-way layout, with blocks and ranking groups, and employs restricted randomization to enhance the accuracy of treatment contrast estimation. We calculate the expected mean square for each source of variation in the ORCRBD under a suitable linear model, develop an approximate F-test for the treatment effect, assess ranking quality, calculate optimal sample sizes for a given cost model, formulate multiple comparison procedures, and apply the design to an educational setting.
Keywords
order restricted randomization
ranked set sampling
intracluster correlation coefficient
Latin square
optimal design
Transparent, trustworthy research depends on sharing data and code and having results verified by others, yet education tends to focus on best practices or knowledge-deficit models that are often insufficient for behavior change. We adopt the Capability, Opportunity, and Motivation for Behavior change (COM-B) model using levers in the Behavior Change Wheel to create educational materials to improve data practices as a behavior change problem in a collaboration among Arkansas Children's Research Institute, UAMS's Institute for Digital Health & Innovation, and Indiana University School of Public Health-Bloomington's Biostatistics Consulting Center. Module 1 covers capabilities, opportunities, and motivations for data and code sharing and verification, acknowledging investigator barriers (e.g., being scooped, attacks, and lack of time, know-how, and resources). Module 2 provides background on capabilities and opportunities to share to enhance reproducibility, while Module 3 covers processes and practices for sharing and verification. Self-paced materials were created using the Rise Articulate platform and are Sharable Content Object Reference Model (SCORM) and Section 508 compliant.
Keywords
Education
Reproducibility
Data sharing
Verification
Behavior change
Co-Author(s)
Stephanie Dickinson, Indiana University, Department of Epidemiology and Biostatistics
CJ Fortune, Institute for Digital Health & Innovation, University of Arkansas for Medical Sciences
Sydney Howk, Institute for Digital Health & Innovation, University of Arkansas for Medical Sciences
Kimberly Lamb, Institute for Digital Health & Innovation, University of Arkansas for Medical Sciences
Anna Macagno, Indiana University School of Public Health-Bloomington
Erik Parker, Indiana University
First Author
Andrew Brown, University of Arkansas for Medical Sciences
Presenting Author
Andrew Brown, University of Arkansas for Medical Sciences
Student attrition is an important issue for higher education as it brings about grave costs to both students and institutions. In this project, we study two-year persistence of students enrolled at a large four-year public institution in California as First-time Freshmen from Fall 2016 to Fall 2020. Predictors considered in the study include student demographic information, socioeconomic variables, academic preparation, and their academic performance at the institution. Two analytical approaches are used, discrete-time survival analysis and random forest. The results from both models indicate that academic performance variables after enrollment are most strongly associated with two-year persistence, including term units earned, term GPA, whether a student is on probation, and whether a student earned units in the first summer after enrollment. Further, monitoring and providing help promptly to students with earned units below 6 or GPA below 2.0 in the first term may prevent them from dropping out. We also illustrate how the random forest model may be used to provide individualized prediction of two-year persistence.
Keywords
student retention
discrete-time survival analysis
random forest
variable importance
individualized prediction