Tuesday, Aug 6: 8:30 AM - 10:20 AM
5079
Contributed Speed
Oregon Convention Center
Room: CC-E141
Main Sponsor
Section on Statistics and Data Science Education
Presentations
At my institution, logistic regression appears in the undergraduate statistics curriculum in a variety of courses: the second course in statistics, a course in categorical data analysis, and a course in advanced regression models. In each course, there are variations in student audience and level, software, notation, and vocabulary used when covering this topic. In this talk, I will demonstrate a Shiny app for customizing a core set of materials on logistic regression based on these characteristics of the course.
Keywords
educational materials
statistics education
Design of Experiments (DoE) is a valuable tool for the pharmaceutical industry to achieve a stable product and robust process, which translates into a reduced time to market.. As such, the DoE methodology helps to identify the critical parameters and their robust operating ranges. DoE is a generic methodology that can be applied to many other sectors such as chemicals, manufacturing, or food. However, the use of DoE is not widespread in these companies. Several authors have identified barriers to the widespread use of DoE, such as a lack of teamworking skills, a perception that it requires more resources than the traditional approach, or the complexity of current DoE software tools. We present guidelines for overcoming several barriers to DoE adoption and illustrate them with a cloud-based DoE software solution. We show how new tools can improve collaboration and lower the barrier to entry for non-expert users. We support our findings with software usage data and a case study.
Keywords
Design of Experiments
Process Optimization
OMARS designs
Robust Optimization
Cloud Software
Adoption Barriers
The American Statistical Association's Ethical Guidelines for Statistical Practice call for practitioners to recognize that statistical practice could adversely affect marginalized groups and to be mindful about adequately contextualizing information. In this talk, I discuss an example that engages students in analyzing data as well as reflecting on how findings from the analysis should be presented to avoid disproportionate harm. Students use data from the 2018 Massachusetts Comprehensive Assessment System (MCAS) exam and investigate evidence of achievement gaps in standardized test scores at the school level. Working through this analysis requires students to demonstrate understanding of several topics, including hypothesis testing and interaction terms in linear regression, as well as fluency with interpreting findings in context of the data. Students then read a perspective piece published in the New England Journal of Medicine calling for caution when reporting on Covid-19 racial health disparities and reflect on how the same sense of caution is important when presenting their analysis of poverty and race-based achievement gaps in MCAS scores.
Keywords
multivariate thinking
statistics education
introductory statistics
communication
In oncology clinical trials, Overall Response Rate (ORR) is often analyzed either as primary or key secondary endpoint. Particularly, durability of response is of interest. As such, Duration if Response (DOR) and Time to Response (TTR) provide supportive information to assess Best Overall Response (BOR). Conventional Analyses of Duration of Response (DOR) is summarized for responders only. Descriptive statistics based on Kaplan-Meier Curves constructed with censored observations from the responders are used to summarize the DOR data. Alternative methods that utilize data from all patients randomized to the study treatments are becoming popular in literature (Huang et al., Annals of Internal Medicine 2020; Huang & Tian, Pharmaceutical Statistics 2022; Weber et al., Pharmaceutical Statistics 2023). These methods are based on mean (expected) duration of a patient being in response. The objective of this presentation is to explore different methodologies to analyze DOR data, to evaluate and compare the performance these different methods.
Keywords
duration of response
estimands
restricted mean duration of response
restricted mean survival time
time to response
Inquiry-based activities allow students to explore questions they find interesting and applicable, motivating deeper engagement in a given task. Authentic data situates inquiry-based statistical explorations in meaningful contexts and advances the development of students' data acumen. We have designed an inquiry-based activity that authentically explores the distributions of rent measured as a percentage of income across the United States. It is the first activity in a series of three that offers students rich experiences exploring questions about authentic data through R Shiny applets. The activity's goal is to strengthen students' data exploration skills while furthering their understanding of the normal distribution. The learning objectives of calculating, interpreting, and drawing conclusions from z-scores, percentiles, and proportions in addition to standardizing and comparing normally distributed data are met. Statistical literacy is fostered as students become aware of social phenomena that are modeled by statistics and are situated in a context of societal and personal importance. Students will learn to standardize and compare normally distributed data in a context that is relevant to their lives, will gain experience taking the lead on statistically investigating a question that is interesting to them, and will practice communicating their results to others. This activity and the R Shiny applet will be discussed to demonstrate ways to situate statistical tasks in meaningful contexts.
Keywords
Inquiry-based
Authentic Data
Undergraduate Introductory Statistics
Statistics Education
The Normal Distrubtion
R Shiny Applets
Abstracts
In an era where data-driven decision-making is paramount, cultivating a robust foundation in statistical understanding is essential for statisticians entering the workforce or already established, especially in the world that emerges from the pandemic. Based on our internal experiences facilitating a peer support group we elucidate strategies employed for enhancing statistical literacy and the attributes contributing to effective communication within the statistical community. We identify the following factors that are crucial to assure success: assigning mentors, investing in continuous training and education, fostering and maintaining communication and development of quality assurance guidelines. In summary, the efforts to build resilient statistical communities contribute directly to informing policy and countering misinformation by fostering a culture of statistical literacy, data integrity, effective communication, and ethical considerations within the field of statistics. This, in turn, strengthens the foundation for evidence-based policymaking and promotes a more informed and resilient society.
Keywords
building resilient statistical communities
mentorship excellence
statistical literacy
A new course was developed that partnered with a non-profit and tabulated, organized,and analyzed their data to answer key questions and help them better serve Iowa communities. In recent years there has been a call for statistics and data science to be learned and practiced by diverse groups, and to be culturally relevant. These principles guided the development and implementation of this course. The community partner Waypoint works with Iowa's houseless families and was interested in differences by race and/or gender of persistent rental placements. To prepare second year undergraduate students, with no prerequisite courses, to answer these types of questions, the course's learning objectives included descriptive statistics, history of race and gender housing inequities, and communicating statistical results, among others. Students listened to podcasts, attended talks, played a board game, and met people in the community, to learn about and understand the current and historical realities of housing inequities in the United States. The course is published at https://bit.ly/STA200.
Keywords
Justice
Inequity
Development
In Fall 2020, the University of Nebraska at Omaha (UNO) successfully introduced a general education quantitative literacy course fusing workforce-critical data science skills with service learning. Seeking to build on UNO's existing success, the University of Washington Tacoma (UWT) is creating their own version of the course for Spring 2024, with researchers collaborating to revise, implement, and assess it in both environments. At UNO, the proposed model contributed to increased data literacy among participants from a broad variety of majors by helping them develop fundamental mathematical, quantitative, and data literacy competencies in ways that are accessible and engaging, while increasing the capacity of local non-profit organizations to use data to answer meaningful questions to further their missions. We predict similar outcomes for UWT and its community, where we expect to find an increase in positive perceptions of mathematics and data science, particularly for non-STEM affiliated students who typically have lower interest and self-efficacy in mathematics and are often from groups underrepresented in STEM. Analysis of data collected at both institutions will be presented.
Keywords
service learning
data literacy
community engagement
high impact practices
underrepresentation in STEM
data visualization and presentation
I'll discuss the innovative integration of Shiny apps as powerful tools to enhance student engagement and understanding in introductory statistics courses. Leveraging real-time data from in-class student polls, I demonstrate how intentionally designed simulations presented via Shiny apps can be utilized to dynamically display, manipulate, and simulate sampling distributions using data that students have a personal connection to. By establishing a direct connection between students and the material, this approach creates an active and immersive learning experience. The session will include a live demonstration of a Shiny app used in Introduction to Statistics for Engineers at Oregon State, showcasing its functionalities and impact on student engagement. I'll share insights gained from developing and teaching with the app, providing valuable lessons learned and practical considerations for educators looking to implement similar techniques. I'll conclude by proposing additional introductory statistics topics that could benefit from this innovative teaching approach, encouraging further exploration and adoption of interactive and immersive tools in statistical education.
Keywords
Active Learning Strategy
Higher education
Statistical educaiton
Empowering students to produce insight by engaging and working with data requires that we support their building of powerful and productive ways of computational thinking. Through task-based interviews, we seek to understand the ways in which computational thinking appears as part of individuals' thinking as they engage in data-ing (data exploration, analysis, and communication) and the similarities and differences between individuals along an expert-novice continuum. We analyzed transcripts of these interviews using grounded theory techniques and models from the literature. In our results, we describe our participants' conceptualization of computational thinking, specifically highlighting the notion of trade-offs and adapting existing code. We also describe some key observations within data-ing, including participants working with the data file format, the hierarchical classification embedded in the variable names, and the construction of visualizations. After comparing our results to dimensions of existing models, we propose our own framework which highlights aspects of computational thinking, data-ing, and resource, and we consider implications for research and teaching.
Keywords
computational thinking
data
expert-novice
statistics education
coding
Introductory data science classes cover a range of topics, including data gathering, exploration, modeling, and visualization. However, data science is still a young discipline, which means little is known about which topics students particularly struggle with.
This paper analyzes student data from three interactive, online data science textbooks. Activity metrics like average number of attempts, proportion of students giving up, and average time to completion, will be used to quantify student struggle. Struggle data from conceptual and programming-based activities will be aggregated from over 50 institutions to identify challenging topics in a first data science course. Data will also be compared between book versions to determine if certain tasks are more difficult in Python or R, or if programming language does not affect performance. Although specific activities are limited to a single course platform, challenging topics and lessons learned will apply broadly.
Keywords
data science
student struggle
online learning
interactive textbooks
It is well known that open data sharing (such as raw csv, Excel or pdf files) contributes to new collaboration and promotes reproducibility and improved analysis.
Instead of sharing just the data set and codebook, we propose publishing a full exploratory data analysis using graphs and explanatory text on a Quarto/R Markdown based website. This in-depth data presentation/visualization provides researchers from various disciplines a clear, accessible and more efficient way to navigate through numerous variables, survey questions and the results. The data set used in this example is from the statewide Basic Needs Student Success Survey, administered by the Center for Healthy Communities, the Prime Contractor for basic needs services on over 50 CA campuses.
These efforts have helped spark collaboration among researchers across the nation, leading to further analyses, publications, and impact. This approach can also provide a hands-on learning experience for undergraduate students to implement their classroom-derived data handling knowledge in a real world setting.
In this presentation we will present the methods, integrations and lessons learned from sharing data in this manner.
Keywords
Collaboration
Data Sharing
Visualization
Survey Data
It is well known that students often feel anxiety and fear when they enter elementary statistics courses in college. This can lead to a frustrating and stressful learning environment. Incorporating writing components into an undergraduate statistics course is not a brand new concept. Several publications have already shown the benefits of writing assignments in a statistics course. However, there is few literature on introducing professional writing in a statistics course at an early stage in college. In this research, we will discuss the benefits and challenges of introducing professional writing in an introductory statistics course, and provide some strategies that make the writing process productive and painless for both students and instructors.
Keywords
Professional writing
Introductory statistics
Learning environment
Observational study
In response to the increased demand for computational skills in internships and careers in statistics, a course in data wrangling, database management, and data visualization was introduced to undergraduate majors as part of the statistics curriculum. The course introduces topics on programming and data structures, web scraping, data wrangling, data normalization, SQL, database management, and data visualization. The cause is taught in a computer lab and students engage with different software and tools in and out of the classroom. In addition to weekly content quizzes, students take an in-class practical exam, and a semester-long group project utilizing real datasets. Software and tools such as R, SQL, MySQL and Tableau are used, and the course is taught using open educational resources. In this presentation I will discuss activities that were implemented in the course, and the success stories of student engagement in this computational experience.
Keywords
Computational Skills
R
SQL
MySQL
Tableau

The Harvard Undergraduate Forestry Data Science Lab (UFDS), in collaboration with the US Forest Service, provides undergraduates the opportunity to learn and apply statistical and data science skills to real-world research projects. The ten-week UFDS summer program provides collaborative research experiences that focus on: working with peers from different backgrounds, project stakeholders, and Forest Service Research Scientists; presenting science to technical and non-technical audiences; developing data science skills such as data visualization, model diagnostics, data wrangling, and code reviews; and reading and writing scientific documents and articles. This talk provides an overview of the training program, lessons learned about providing meaningful and impactful learning experiences to undergraduate students in data science, working withn undergraduates with diverse backgrounds and skills, and building environments in which students can strengthen their identity and confidence as a statistician, data scientist, and human.
Keywords
undergraduate research
collaboration
sense of belonging
code reviews
small area estimation
survey statistics
While there has been considerable work on guiding educators on how to structure a course in data science for imparting technical knowledge (e.g. Hicks and Irizarry (2018)), we argue, based on employer feedback and industry relations, that a larger part of the curriculum needs to be devoted to problem formulation, deployment, solution design, model monitoring and communication of results. Emphasising these practical aspects imposes new requirements on the instructor and the coordinating department. An example of the demand on the instructors is the breadth of knowledge they are required to know. The department, on the other hand, needs a steady stream of case studies for students to work on; this is exacerbated by increasing class sizes. In this talk we present our observations and thoughts on these challenges, based on our experience of teaching these topics over 4 semesters to approximately more than 400 students (and growing).
1. Hicks, Stephanie C., and Rafael A. Irizarry. "A guide to teaching data science." The American Statistician 72, no. 4 (2018): 382
Keywords
data science
practice
end-to-end
teaching
curriculum
syllabus
This paper presents the second phase of the Temporal Metrics project, an innovative exploration (using mathematics and AI augmented research) into the human perception of time using deep learning methodologies. Building on the foundational development of the Cr constant - a novel metric quantifying time perception variations - this phase extends the application to a deep learning model. The model predicts individual time perception categories - "Average," "Slower," or "Faster" - based on a comprehensive array of conditions and lifestyle factors, each weighted by the associated Cr value. The data for this study, derived from a theoretical sample representing 0.001% of the U.S. adult population, encompass demographic information, psychological conditions, lifestyle factors, and substance use. This project highlights the potent combination of theoretical constructs with advanced machine learning techniques, offering groundbreaking insights into the subjective experience of time. Our results demonstrate the model's high accuracy in predicting time perception categories, paving the way for future empirical research and potential applications in behavioral monitoring and mental health.
Keywords
Time Perception
Deep Learning
Weber's Law
Chat GPT 4.0
AI Augmented Research
Human time perception
There has been robust research to understand the relationship between a country's economic performance (typically measured in GDP per Capita) and the mathematical proficiency of its students (typically assessed by a standardized math score). Current research consistently shows a strong positive correlation between these two measures at a global level. Thy hypothesis of this paper is that such a strong correlation would reduce once the economy surpasses a certain threshold. Specifically, our research focuses on the United States and examines this relationship across its 50 states. We utilize each state's Grade 8 average math score from the National Assessment of Educational Progress (NAEP) and GDP per capita to investigate this relationship. Data visualization and statistical inferences are used to quantify and reveal the relationship between these two measures. As hypothesized, the correlation in the United states is significantly diminished compared to what previous work has shown at the global level. This work will help policy makers understand the complex relationship between the economy and math performance in order to make more effective strategies to enhance education.
Keywords
math proficiency
GDP per capita
correlation
National Assessment of Educational Progress
education policy
data visualization
Data visualization enables people to convey complicated information in visual format, but we do not know what types of visualizations help or hurt interpretation. Using the AmeriSpeak Omnibus survey, a biweekly nationally representative survey of respondents from a probability-based survey panel, we examined user understanding and interpretation of different chart types. We asked panelists in three rounds of the survey the same questions targeting their understanding of the visual information presented to them while varying the design of the chart each time. Participants were asked to estimate specific values shown and determine if certain statements were supported by the data displayed in the chart. This research analyzes participants' accuracy and identifies differences in response patterns by chart type and across population subgroups. Our findings will be used to improve data visualization practices and provide key insights about graphical literacy among U.S. adults.
Keywords
Data Visualization
Visual Literacy
Charts