SPEED 6: Statistics in Education & Applications, Part 1

Elizabeth Handorf Chair
Rutgers University, Rutgers Cancer Institute of New Jersey
 
Tuesday, Aug 6: 8:30 AM - 10:20 AM
5079 
Contributed Speed 
Oregon Convention Center 
Room: CC-E141 

Main Sponsor

Section on Statistics and Data Science Education

Presentations

A Shiny App for Customizing Course Materials for Logistic Regression

At my institution, logistic regression appears in the undergraduate statistics curriculum in a variety of courses: the second course in statistics, a course in categorical data analysis, and a course in advanced regression models. In each course, there are variations in student audience and level, software, notation, and vocabulary used when covering this topic. In this talk, I will demonstrate a Shiny app for customizing a core set of materials on logistic regression based on these characteristics of the course. 

Keywords

educational materials

statistics education 

View Abstract 3168

First Author

Amy Froelich, Iowa State University

Presenting Author

Amy Froelich, Iowa State University

Adoption barriers for the use of Design of Experiments

Design of Experiments (DoE) is a valuable tool for the pharmaceutical industry to achieve a stable product and robust process, which translates into a reduced time to market.. As such, the DoE methodology helps to identify the critical parameters and their robust operating ranges. DoE is a generic methodology that can be applied to many other sectors such as chemicals, manufacturing, or food. However, the use of DoE is not widespread in these companies. Several authors have identified barriers to the widespread use of DoE, such as a lack of teamworking skills, a perception that it requires more resources than the traditional approach, or the complexity of current DoE software tools. We present guidelines for overcoming several barriers to DoE adoption and illustrate them with a cloud-based DoE software solution. We show how new tools can improve collaboration and lower the barrier to entry for non-expert users. We support our findings with software usage data and a case study. 

Keywords

Design of Experiments

Process Optimization

OMARS designs

Robust Optimization

Cloud Software

Adoption Barriers 

View Abstract 2969

Co-Author

Jose Nunez Ares, EFFEX

First Author

Dewi Van De Vyver

Presenting Author

Dewi Van De Vyver

An example of combining data analysis with ethics-oriented reflection

The American Statistical Association's Ethical Guidelines for Statistical Practice call for practitioners to recognize that statistical practice could adversely affect marginalized groups and to be mindful about adequately contextualizing information. In this talk, I discuss an example that engages students in analyzing data as well as reflecting on how findings from the analysis should be presented to avoid disproportionate harm. Students use data from the 2018 Massachusetts Comprehensive Assessment System (MCAS) exam and investigate evidence of achievement gaps in standardized test scores at the school level. Working through this analysis requires students to demonstrate understanding of several topics, including hypothesis testing and interaction terms in linear regression, as well as fluency with interpreting findings in context of the data. Students then read a perspective piece published in the New England Journal of Medicine calling for caution when reporting on Covid-19 racial health disparities and reflect on how the same sense of caution is important when presenting their analysis of poverty and race-based achievement gaps in MCAS scores. 

Keywords

multivariate thinking

statistics education

introductory statistics

communication 

View Abstract 2492

First Author

Julie Vu

Presenting Author

Julie Vu

Analysis of Duration of Response Using Conventional and Alternative Approaches

In oncology clinical trials, Overall Response Rate (ORR) is often analyzed either as primary or key secondary endpoint. Particularly, durability of response is of interest. As such, Duration if Response (DOR) and Time to Response (TTR) provide supportive information to assess Best Overall Response (BOR). Conventional Analyses of Duration of Response (DOR) is summarized for responders only. Descriptive statistics based on Kaplan-Meier Curves constructed with censored observations from the responders are used to summarize the DOR data. Alternative methods that utilize data from all patients randomized to the study treatments are becoming popular in literature (Huang et al., Annals of Internal Medicine 2020; Huang & Tian, Pharmaceutical Statistics 2022; Weber et al., Pharmaceutical Statistics 2023). These methods are based on mean (expected) duration of a patient being in response. The objective of this presentation is to explore different methodologies to analyze DOR data, to evaluate and compare the performance these different methods. 

Keywords

duration of response

estimands

restricted mean duration of response

restricted mean survival time

time to response 

View Abstract 3269

Co-Author(s)

Prabhu Bhagavatheeswaran, Bristol Myers Sqyibb
Jixian Wang, Bristol Myers Squibb
Ram Tiwari, Bristol Myers Squibb

First Author

Apurva Bhingare, Bristol Myers Squibb

Presenting Author

Apurva Bhingare, Bristol Myers Squibb

Authentic Data Explorations: Investigating the Normal Distribution through Comparative Rent Data

Inquiry-based activities allow students to explore questions they find interesting and applicable, motivating deeper engagement in a given task. Authentic data situates inquiry-based statistical explorations in meaningful contexts and advances the development of students' data acumen. We have designed an inquiry-based activity that authentically explores the distributions of rent measured as a percentage of income across the United States. It is the first activity in a series of three that offers students rich experiences exploring questions about authentic data through R Shiny applets. The activity's goal is to strengthen students' data exploration skills while furthering their understanding of the normal distribution. The learning objectives of calculating, interpreting, and drawing conclusions from z-scores, percentiles, and proportions in addition to standardizing and comparing normally distributed data are met. Statistical literacy is fostered as students become aware of social phenomena that are modeled by statistics and are situated in a context of societal and personal importance. Students will learn to standardize and compare normally distributed data in a context that is relevant to their lives, will gain experience taking the lead on statistically investigating a question that is interesting to them, and will practice communicating their results to others. This activity and the R Shiny applet will be discussed to demonstrate ways to situate statistical tasks in meaningful contexts.  

Keywords

Inquiry-based

Authentic Data

Undergraduate Introductory Statistics

Statistics Education

The Normal Distrubtion

R Shiny Applets 

Abstracts


Co-Author(s)

Justin Post, North Carolina State University
Jennifer Green, Michigan State University
Sunghwan Byun, North Carolina State University

First Author

Maria Cruciani, Michigan State University

Presenting Author

Maria Cruciani, Michigan State University

Building Resilient Statistical Communities: Promoting Statistical Literacy and Mentorship Excellence

In an era where data-driven decision-making is paramount, cultivating a robust foundation in statistical understanding is essential for statisticians entering the workforce or already established, especially in the world that emerges from the pandemic. Based on our internal experiences facilitating a peer support group we elucidate strategies employed for enhancing statistical literacy and the attributes contributing to effective communication within the statistical community. We identify the following factors that are crucial to assure success: assigning mentors, investing in continuous training and education, fostering and maintaining communication and development of quality assurance guidelines. In summary, the efforts to build resilient statistical communities contribute directly to informing policy and countering misinformation by fostering a culture of statistical literacy, data integrity, effective communication, and ethical considerations within the field of statistics. This, in turn, strengthens the foundation for evidence-based policymaking and promotes a more informed and resilient society. 

Keywords

building resilient statistical communities

mentorship excellence

statistical literacy 

View Abstract 2610

Co-Author

Anna Giczewska

First Author

Miloni Shah, Duke Clinical Research Institute

Presenting Author

Miloni Shah, Duke Clinical Research Institute

Course: Statistics for Social Justice

A new course was developed that partnered with a non-profit and tabulated, organized,and analyzed their data to answer key questions and help them better serve Iowa communities. In recent years there has been a call for statistics and data science to be learned and practiced by diverse groups, and to be culturally relevant. These principles guided the development and implementation of this course. The community partner Waypoint works with Iowa's houseless families and was interested in differences by race and/or gender of persistent rental placements. To prepare second year undergraduate students, with no prerequisite courses, to answer these types of questions, the course's learning objectives included descriptive statistics, history of race and gender housing inequities, and communicating statistical results, among others. Students listened to podcasts, attended talks, played a board game, and met people in the community, to learn about and understand the current and historical realities of housing inequities in the United States. The course is published at https://bit.ly/STA200. 

Keywords

Justice

Inequity

Development 

View Abstract 3691

First Author

Tyler George

Presenting Author

Tyler George

Data Literacy & Visualization: Improving STEM Education Through Service Learning at Two Institutions

In Fall 2020, the University of Nebraska at Omaha (UNO) successfully introduced a general education quantitative literacy course fusing workforce-critical data science skills with service learning. Seeking to build on UNO's existing success, the University of Washington Tacoma (UWT) is creating their own version of the course for Spring 2024, with researchers collaborating to revise, implement, and assess it in both environments. At UNO, the proposed model contributed to increased data literacy among participants from a broad variety of majors by helping them develop fundamental mathematical, quantitative, and data literacy competencies in ways that are accessible and engaging, while increasing the capacity of local non-profit organizations to use data to answer meaningful questions to further their missions. We predict similar outcomes for UWT and its community, where we expect to find an increase in positive perceptions of mathematics and data science, particularly for non-STEM affiliated students who typically have lower interest and self-efficacy in mathematics and are often from groups underrepresented in STEM. Analysis of data collected at both institutions will be presented. 

Keywords

service learning

data literacy

community engagement

high impact practices

underrepresentation in STEM

data visualization and presentation 

View Abstract 3699

Co-Author(s)

Betty Love, University of Nebraska-Omaha
Michelle Friend, University of Nebraska-Omaha
Becky Brusky, University of Nebraska-Omaha
Julie Dierberger, University of Nebraska-Omaha

First Author

Zaher Kmail, University of Washington-Tacoma

Presenting Author

Zaher Kmail, University of Washington-Tacoma

Enhancing Engagement in Introductory Statistics through Student-Centered Simulations

I'll discuss the innovative integration of Shiny apps as powerful tools to enhance student engagement and understanding in introductory statistics courses. Leveraging real-time data from in-class student polls, I demonstrate how intentionally designed simulations presented via Shiny apps can be utilized to dynamically display, manipulate, and simulate sampling distributions using data that students have a personal connection to. By establishing a direct connection between students and the material, this approach creates an active and immersive learning experience. The session will include a live demonstration of a Shiny app used in Introduction to Statistics for Engineers at Oregon State, showcasing its functionalities and impact on student engagement. I'll share insights gained from developing and teaching with the app, providing valuable lessons learned and practical considerations for educators looking to implement similar techniques. I'll conclude by proposing additional introductory statistics topics that could benefit from this innovative teaching approach, encouraging further exploration and adoption of interactive and immersive tools in statistical education. 

Keywords

Active Learning Strategy

Higher education

Statistical educaiton 

View Abstract 3722

First Author

Erin Howard, Oregon State University

Presenting Author

Erin Howard, Oregon State University

Exploring How Novices and Experts Engage in Computational Thinking with Data

Empowering students to produce insight by engaging and working with data requires that we support their building of powerful and productive ways of computational thinking. Through task-based interviews, we seek to understand the ways in which computational thinking appears as part of individuals' thinking as they engage in data-ing (data exploration, analysis, and communication) and the similarities and differences between individuals along an expert-novice continuum. We analyzed transcripts of these interviews using grounded theory techniques and models from the literature. In our results, we describe our participants' conceptualization of computational thinking, specifically highlighting the notion of trade-offs and adapting existing code. We also describe some key observations within data-ing, including participants working with the data file format, the hierarchical classification embedded in the variable names, and the construction of visualizations. After comparing our results to dimensions of existing models, we propose our own framework which highlights aspects of computational thinking, data-ing, and resource, and we consider implications for research and teaching. 

Keywords

computational thinking

data

expert-novice

statistics education

coding 

View Abstract 3652

Co-Author(s)

Neil Hatfield, Pennsylvania State University
Matthew Beckman, Penn State University

First Author

Alyssa Hu, Penn State University

Presenting Author

Alyssa Hu, Penn State University

Exploring student struggle in introductory data science courses

Introductory data science classes cover a range of topics, including data gathering, exploration, modeling, and visualization. However, data science is still a young discipline, which means little is known about which topics students particularly struggle with.

This paper analyzes student data from three interactive, online data science textbooks. Activity metrics like average number of attempts, proportion of students giving up, and average time to completion, will be used to quantify student struggle. Struggle data from conceptual and programming-based activities will be aggregated from over 50 institutions to identify challenging topics in a first data science course. Data will also be compared between book versions to determine if certain tasks are more difficult in Python or R, or if programming language does not affect performance. Although specific activities are limited to a single course platform, challenging topics and lessons learned will apply broadly. 

Keywords

data science

student struggle

online learning

interactive textbooks 

View Abstract 3569

Co-Author

Pamela Fellers, Wiley/zyBooks

First Author

Aimee Schwab-McCoy, zyBooks

Presenting Author

Aimee Schwab-McCoy, zyBooks

Fostering collaboration through innovative data sharing using Quarto/R Markdown based websites

It is well known that open data sharing (such as raw csv, Excel or pdf files) contributes to new collaboration and promotes reproducibility and improved analysis.
Instead of sharing just the data set and codebook, we propose publishing a full exploratory data analysis using graphs and explanatory text on a Quarto/R Markdown based website. This in-depth data presentation/visualization provides researchers from various disciplines a clear, accessible and more efficient way to navigate through numerous variables, survey questions and the results. The data set used in this example is from the statewide Basic Needs Student Success Survey, administered by the Center for Healthy Communities, the Prime Contractor for basic needs services on over 50 CA campuses.
These efforts have helped spark collaboration among researchers across the nation, leading to further analyses, publications, and impact. This approach can also provide a hands-on learning experience for undergraduate students to implement their classroom-derived data handling knowledge in a real world setting.
In this presentation we will present the methods, integrations and lessons learned from sharing data in this manner. 

Keywords

Collaboration

Data Sharing

Visualization

Survey Data 

View Abstract 3026

Co-Author(s)

Robin Donatello, California State University, Chico
Shady Shamy, Center for Healthy Communities
Stephanie Bianco, Center for Healthy Communites

First Author

Saul Mooradian

Presenting Author

Saul Mooradian

Introducing professional writing in an introductory statistics course

It is well known that students often feel anxiety and fear when they enter elementary statistics courses in college. This can lead to a frustrating and stressful learning environment. Incorporating writing components into an undergraduate statistics course is not a brand new concept. Several publications have already shown the benefits of writing assignments in a statistics course. However, there is few literature on introducing professional writing in a statistics course at an early stage in college. In this research, we will discuss the benefits and challenges of introducing professional writing in an introductory statistics course, and provide some strategies that make the writing process productive and painless for both students and instructors. 

Keywords

Professional writing


Introductory statistics

Learning environment

Observational study 

View Abstract 2619

Co-Author(s)

Yi Shang, John Carroll University
Meredith Steck, John Carroll University

First Author

Shurong Fang, John Carroll University

Presenting Author

Shurong Fang, John Carroll University

Response to Industry Demand for Computational Skills in Statistics Majors

In response to the increased demand for computational skills in internships and careers in statistics, a course in data wrangling, database management, and data visualization was introduced to undergraduate majors as part of the statistics curriculum. The course introduces topics on programming and data structures, web scraping, data wrangling, data normalization, SQL, database management, and data visualization. The cause is taught in a computer lab and students engage with different software and tools in and out of the classroom. In addition to weekly content quizzes, students take an in-class practical exam, and a semester-long group project utilizing real datasets. Software and tools such as R, SQL, MySQL and Tableau are used, and the course is taught using open educational resources. In this presentation I will discuss activities that were implemented in the course, and the success stories of student engagement in this computational experience. 

Keywords

Computational Skills

R

SQL

MySQL

Tableau 

View Abstract 2816

First Author

Rasitha Jayesekere, Butler University

Presenting Author

Rasitha Jayesekere, Butler University

Teaching and Learning via Undergraduate Research: Notes from the Harvard Forestry Data Science Lab


The Harvard Undergraduate Forestry Data Science Lab (UFDS), in collaboration with the US Forest Service, provides undergraduates the opportunity to learn and apply statistical and data science skills to real-world research projects. The ten-week UFDS summer program provides collaborative research experiences that focus on: working with peers from different backgrounds, project stakeholders, and Forest Service Research Scientists; presenting science to technical and non-technical audiences; developing data science skills such as data visualization, model diagnostics, data wrangling, and code reviews; and reading and writing scientific documents and articles. This talk provides an overview of the training program, lessons learned about providing meaningful and impactful learning experiences to undergraduate students in data science, working withn undergraduates with diverse backgrounds and skills, and building environments in which students can strengthen their identity and confidence as a statistician, data scientist, and human. 

Keywords

undergraduate research

collaboration

sense of belonging

code reviews

small area estimation

survey statistics 

View Abstract 3097

Co-Author

Kelly McConville, Harvard University

First Author

Grayson White

Presenting Author

Grayson White

Teaching Practical Data Science

While there has been considerable work on guiding educators on how to structure a course in data science for imparting technical knowledge (e.g. Hicks and Irizarry (2018)), we argue, based on employer feedback and industry relations, that a larger part of the curriculum needs to be devoted to problem formulation, deployment, solution design, model monitoring and communication of results. Emphasising these practical aspects imposes new requirements on the instructor and the coordinating department. An example of the demand on the instructors is the breadth of knowledge they are required to know. The department, on the other hand, needs a steady stream of case studies for students to work on; this is exacerbated by increasing class sizes. In this talk we present our observations and thoughts on these challenges, based on our experience of teaching these topics over 4 semesters to approximately more than 400 students (and growing).

1. Hicks, Stephanie C., and Rafael A. Irizarry. "A guide to teaching data science." The American Statistician 72, no. 4 (2018): 382 

Keywords

data science

practice

end-to-end

teaching

curriculum

syllabus 

View Abstract 2015

Co-Author

Sergio Hernandez-Marin, Professor

First Author

Vik Gopal

Presenting Author

Vik Gopal

Temporal Metrics Part II Advancing the Understanding of Time Perception Through Deep Learning

This paper presents the second phase of the Temporal Metrics project, an innovative exploration (using mathematics and AI augmented research) into the human perception of time using deep learning methodologies. Building on the foundational development of the Cr constant - a novel metric quantifying time perception variations - this phase extends the application to a deep learning model. The model predicts individual time perception categories - "Average," "Slower," or "Faster" - based on a comprehensive array of conditions and lifestyle factors, each weighted by the associated Cr value. The data for this study, derived from a theoretical sample representing 0.001% of the U.S. adult population, encompass demographic information, psychological conditions, lifestyle factors, and substance use. This project highlights the potent combination of theoretical constructs with advanced machine learning techniques, offering groundbreaking insights into the subjective experience of time. Our results demonstrate the model's high accuracy in predicting time perception categories, paving the way for future empirical research and potential applications in behavioral monitoring and mental health. 

Keywords

Time Perception

Deep Learning

Weber's Law

Chat GPT 4.0

AI Augmented Research

Human time perception 

View Abstract 2848

First Author

Cammie Newmyer, Math That Makes Sense

Presenting Author

Cammie Newmyer, Math That Makes Sense

The Relation Between the Economy & Math Proficiency Within the United States

There has been robust research to understand the relationship between a country's economic performance (typically measured in GDP per Capita) and the mathematical proficiency of its students (typically assessed by a standardized math score). Current research consistently shows a strong positive correlation between these two measures at a global level. Thy hypothesis of this paper is that such a strong correlation would reduce once the economy surpasses a certain threshold. Specifically, our research focuses on the United States and examines this relationship across its 50 states. We utilize each state's Grade 8 average math score from the National Assessment of Educational Progress (NAEP) and GDP per capita to investigate this relationship. Data visualization and statistical inferences are used to quantify and reveal the relationship between these two measures. As hypothesized, the correlation in the United states is significantly diminished compared to what previous work has shown at the global level. This work will help policy makers understand the complex relationship between the economy and math performance in order to make more effective strategies to enhance education. 

Keywords

math proficiency

GDP per capita

correlation

National Assessment of Educational Progress

education policy

data visualization 

View Abstract 1993

First Author

Emily Wang

Presenting Author

Emily Wang

User Interpretation of Data Visualization Contents Across Chart Types

Data visualization enables people to convey complicated information in visual format, but we do not know what types of visualizations help or hurt interpretation. Using the AmeriSpeak Omnibus survey, a biweekly nationally representative survey of respondents from a probability-based survey panel, we examined user understanding and interpretation of different chart types. We asked panelists in three rounds of the survey the same questions targeting their understanding of the visual information presented to them while varying the design of the chart each time. Participants were asked to estimate specific values shown and determine if certain statements were supported by the data displayed in the chart. This research analyzes participants' accuracy and identifies differences in response patterns by chart type and across population subgroups. Our findings will be used to improve data visualization practices and provide key insights about graphical literacy among U.S. adults. 

Keywords

Data Visualization

Visual Literacy

Charts 

View Abstract 3603

Co-Author(s)

Sydney Bell, NORC at The University of Chicago
Kiegan Rice, NORC at The University of Chicago
Heike Hofmann, Iowa State University

First Author

Taylor Wing, NORC at The University of Chicago

Presenting Author

Taylor Wing, NORC at The University of Chicago