Thursday, Aug 7: 10:30 AM - 12:20 PM
4230
Contributed Papers
Music City Center
Room: CC-205C
Main Sponsor
Section on Statistics and Data Science Education
Presentations
This paper introduces a new family of distributions called the hyperbolic tangent (HT) family. The cumulative distribution function of this model is defined using the standard hyperbolic tangent function. The fundamental properties of the distribution are thoroughly examined and presented. Additionally, an inverse exponential distribution is employed as a sub-model within the HT family, and its properties are also derived. The parameters of the HT family are estimated using the maximum likelihood method, and the performance of these estimators is assessed using a simulation approach. To demonstrate the significance and flexibility of the newly introduced family of distributions, two real data sets are utilized. These data sets serve as practical examples that showcase the applicability and usefulness of the HT family in real-world scenarios. By introducing the HT family, exploring its properties, employing the maximum likelihood estimation, and conducting simulations and real data analyses, this paper contributes to the advancement of statistical modeling and distribution theory.
Keywords
Goodness-of-fit
Hyperbolic tangent function
Inverse exponential distribution
Maximum likelihood estimation
Moments
Simulation
This study investigates the nexus between research and teaching productivity among STEM faculty at a public research-intensive university, analyzing data from 553 faculty members across four STEM disciplines: Information and Computer Sciences, Biological Sciences, Engineering, and Physical Sciences. By applying cluster analysis with the NbClust package and logistic regression, this research explores correlations between academic productivity metrics and faculty demographics, including position type, rank, gender, and discipline etc. The analysis identifies distinct productivity clusters characterized by varying levels of research and teaching outcomes across demographic groups, highlighting significant disparities. These findings highlight the need for institutional policies that comprehensively support both teaching and research, thereby fostering STEM faculty success. This study provides a nuanced understanding of STEM faculty productivity profiles, informing strategies for equitable institutional resource allocation, faculty development, and evaluation, ultimately contributing to the advancement of STEM education and fulfilling institutional missions.
Keywords
NbClust Package
Cluster Analysis
Logistic Regression
STEM Education
Academic Productivity (Teaching and Research)
STEM Faculty Characteristics
We propose a Bayesian hierarchical framework for analyzing large-scale mathematics tutoring dialogues that models cognitive load as latent variables inferred from observable behavioral patterns in educational conversations. Our approach treats response timing patterns and communication modality choices (i.e., sending text vs. images) as observable indicators of underlying cognitive states, with a two-phase experimental design comparing behavioral-only versus content enhanced models incorporating LLM-based understanding classification. Applied to MathMentorDB---5.4 million messages across 200,332 tutoring conversations---our method reveals bidirectional cognitive dependencies where student confusion systematically increases tutor cognitive load, and vice versa. We demonstrate that temporal and modality patterns can reliably indicate latent cognitive states in educational dialogues, with cross-role dependencies providing new insights into collaborative learning dynamics. This work bridges research from education, Bayesian statistics, and natural language processing, providing both methodological innovations for modeling cognitive load in online learning conversations and actionable insights for designing adaptive tutoring systems.
Keywords
Large Language Models
Educational Data Mining
Bayesian Hierarchical Modeling
Artificial Intelligence (AI)
Data Science
Natural Language Processing
Public data spaces have become a cornerstone of modern data governance, enabling secure, transparent, and interoperable sharing among public and private sectors. As data-driven decision-making expands, robust design principles are essential to ensure efficiency, trust, and ethical use. This paper reviews literature from European, U.S., and international frameworks, outlining best practices and challenges in implementing public data spaces.
Building on this review, we propose a set of core guidelines for the design of good public data spaces that emphasize interoperability, privacy, governance, and stakeholder collaboration. We also offer a structured proposal for incorporating these principles into graduate curricula in statistics and data science, ensuring future professionals develop skills in data interoperability standards, privacy-preserving sharing, legal and ethical frameworks, and data stewardship.
A case study on designing a National Quality Infrastructure (NQI) data space demonstrates how well-governed public data ecosystems can improve standardization, accreditation, metrology, and quality control, ultimately enhancing economic performance and regulatory efficiency.
Keywords
Graduate Curriculum
Data Governance
Public Data Spaces
Open Data
Data Interoperability
National Quality Infrastructure
The Data Mine, currently in its seventh year, enables more than 2000 interdisciplinary graduate and undergraduate students with hands-on experience in data science. Based on the principles of learning by doing, teamwork, and real-world data science, this model is used by learners at more than 60 colleges annually. Additionally, The Data Mine enables approximately 100 projects with Corporate Partners across many types of domains, including aerospace, agriculture, manufacturing, pharmaceutical science, etc. This model has proven to be a very effective method for colleges and companies to quickly and easily build relationships that create genuine value for partners and students alike. The newest Data Mine location in Indianapolis is a successful example of this model to rapidly scale and return a strong institutional investment. This session will briefly explain why The Data Mine has become pervasive as a model for data science research across institutions of varying profiles. A case study of the past year launch of Indianapolis will be included.
Keywords
industry-university partnerships
experiential learning
data science
mentoring
industry-student collaboration
student development
Participants will embark on a journey through the development and modern practice of statistics and data science in the British Isles, exploring a new course, "The History of Statistics in the UK and Ireland." Pictures from the course will help participants feel like they were there. Site selection, course logistics, unique challenges, pedagogies, and student and faculty outcomes will be discussed. No prior knowledge is required, and instructors interested in traveling abroad will receive helpful advice. Course materials, including the course website, will be shared, providing a comprehensive model for developing similar courses. Individuals interested in the course topics are also encouraged to attend.
Keywords
study abroad
history of statistics
traveling course
course design
How can we effectively integrate data science into early education? Should it be woven into the formal curriculum or offered as part of extracurricular activities? Current literature supports both strategies, yet many mathematics teachers feel unprepared due to a lack of training in data science and programming languages such as R and Python. We propose a slightly modified version of The Data Mine (TDM) model to tackle these hurdles. Proposed model fosters collaboration among undergraduate students, researchers, and industry professionals, creating a vibrant learning community that also engages secondary school students. By promoting experiential learning in addressing real-world data science challenges outside the traditional curriculum, this initiative equips students with essential skills and cultivates a culture of innovation. Ultimately, early engagement in data science will prepare students for the complexities of tomorrow's world and inspire them to become proactive contributors to society.
Keywords
Data science education
secondary school
TDM
experiential learning
learning community