05/02/2025: 8:25 AM - 9:55 AM MDT
Lightning
Room: Wasatch
This session will be followed by an e-poster session on May 2 from 11:05 - 11:30 AM.
Chair
Adam Loy, Carleton College
Presentations
This study investigates undergraduate student awareness of potential biases in data science. A survey of 20 undergraduate students assessed their understanding of how bias can manifest in data collection, analysis, and interpretation. The findings reveal a wide range of awareness levels, with 85% of students acknowledging some understanding of the concept but only 45% expressing confidence in their ability to explain it fully. Additionally, 35% of students admitted that their explanation of bias in data science would be mostly guesswork. These results highlight the need for increased educational efforts to ensure students are well-versed in the nuances of bias and its potential impact on data-driven decision-making.
Presenting Author
Mohammed Alam, Jacksonville State University
First Author
Mohammed Alam, Jacksonville State University
CoAuthor(s)
Jason Cleveland, Jacksonville State University
Janice Case, Jacksonville State University
The study investigates the common gaps that undergraduate students need to bridge to become and identify as doers of data science. Data science education plays a crucial role in shaping students' professional identities and influencing how they view themselves as active members and contributors to a professional community. The study participants included 39 undergraduate students enrolled in data science courses at a university in the Southeastern United States. These data science courses are open to students from all majors and programs on campus and do not require any prerequisites. Data was collected from a semi-structured interview prompt that asked students: "Can you think of one big change or smaller changes that you would need to experience to feel more comfortable identifying as a person who does data science?" Through thematic analysis, a total of seven themes were identified.
The findings indicated that gaining professional experience and a job title in data science represents the greatest gap identified by students. Students also identified earning a data science credential as an important gap in identifying as a doer of data science. Other gaps include: taking advanced data science coursework, developing related technical skills, applying data science in authentic contexts, working with complex data, and engaging in data science communities.
The outcome of this study presents ways data science programs can provide students with learning experiences that support the development of self-perception as data science doers. In particular, these findings could inform instructors and programs on designing and implementing authentic learning opportunities, such as working with real-world datasets and engaging in project-based learning, to equip students with applicable skills and professional expertise.
Presenting Author
Doreen Mushi, North Carolina State University
First Author
Doreen Mushi, North Carolina State University
CoAuthor
Sunghwan Byun, North Carolina State University
Our research emphasizes the importance of critically engaging with GenAI tools such as ChatGPT, Copilot, or Gemini, focusing on the dialogue in the Gen AI tool prompt. In the same way that understanding keywords opens up search tools, effective dialogue techniques are essential for students to truly explore the power of GenAI, combined with understanding when the use of GenAI tools is effective or hindering their learning. Many instructors worry that GenAI allows students to "skip the boring parts," which are critical for deep understanding, but for data-driven fields such as statistics, computer science, and data science, we can move quickly to a meta-level of understanding as we teach how the tools "think," including understanding of data privacy, assumption checking, model selection, and other "big ideas" embedded in GenAI that grow from our subject matter, such as inference, algorithm, and storytelling.
The presentation will showcase class assignments such as regression models or simple decision trees, designed to help students explore these issues, incorporating strategic questioning techniques and even asking GenAI about its own strengths and weaknesses. For small projects, working independently alongside a partner and an AI tool can help students develop key skills, while for larger assignments, treating a GenAI tool as a secondary peer reviewer can help students avoid fabricated references and absurd conclusions.
Presenting Author
K. Scott Alberts, Truman State University
First Author
K. Scott Alberts, Truman State University
CoAuthor
HyunJu Kim, Truman State University
Music producers often find themselves confronted with sprawling and disorganized sample libraries, which can cause inefficiencies in creative workflows. Sample libraries are collections of short audio files, which can be rendered or recorded samples of kicks, snares, vocals, synths, etc. Manual categorization of these audio files becomes time-consuming as the size of a collection grows, which can be remedied by using automated solutions such as sample managers.
However, sample managers are often fully-fledged programs which push producers to redefine existing workflows around the product they use. These programs are costly as well, some single payments upwards of $60 such as Samplism, Sononym, or relying on subscriptions like Splice.
This presentation discusses a minimalist and lightweight approach to sample management, utilizing a hybrid convolutional neural network (CNN) to classify audio samples into categories commonly used in music production. The model combines spectral feature extraction with time-domain analysis, using a 2D CNN and 1D CNN respectively.
By decoupling from bloated suites and focusing on a simple interface, we reduce cognitive load, letting producers focus on creativity-not software. The system is designed as a one-click solution-producers load their sample folder, click "Sort," and receive an organized library in seconds.
Presenting Author
Clark Allen, Utah Valley University
First Author
Clark Allen, Utah Valley University
How does social capital generated between class-peers impact student's identity as data scientists? Does the type of instruction given in the classroom impact this relationship? We propose studying the peer networks generated in data science classrooms and exploring the influence of one's connection to peer-students on their identity as data scientists. Specifically, we propose exploring social capital in terms of network centrality, connectedness, support and reciprocity. We propose operationalising data science identity in terms of one's feeling of belonging and comfort in data science and the relevance of data science to their career. Finally, we explore whether student centredness vs. instructor centredness impacts the nature of this relationship. Or whether group activities and project-based learning can facilitate the types of peer interactions that generate social capital and help data science students develop professional identities. To collect data, we are surveying students throughout the semester. We will also code the syllabus of the course in which they are enrolled in terms of the pedagogical structure of the course.
Presenting Author
Tom Leppard
First Author
Tom Leppard
CoAuthor
Steve McDonald, North Carolina State University
In this talk and poster, we share information about the International Statistical Literacy Project (ISLP) poster competition and how we run it in Canada, from a national competition coordinator and university course instructor's viewpoints.
The ISLP is run by the education section of the International Statistical Institute (ISI), the International Association for Statistical Education (IASE), to support, create and participate in statistical literacy activities and promotion around the world. One of the initiatives they introduced was an international ISLP poster competition that runs every two years. More than twenty countries hold national competitions ahead of the international, including Canada and USA. This competition is an excellent opportunity to promote statistical literacy across each participating country and to engage with the international community.
Students (grade 4 to post-secondary) create posters to demonstrate their statistical literacy skills and teachers/instructors submit top posters through the competition website before Feb 28, 2025. Then, winners of the national competition will represent their country in the international competition which is taking place in spring 2025.
In Canada, the Statistical Society of Canada (SSC) introduced and ran Canadian ISLP poster competitions ahead of the last few international competitions and developed and maintains a Canadian ISLP poster competition website. The Statistical education committee (SEC) of SSC oversees promotion of the competition, website maintenance and judging of posters, and submission of national winning posters to the international competition.
In this talk and poster, chair of the SEC, who is also an associate professor in statistics at a Canadian university, will talk about prompting the competition to students and teachers, and results.
Presenting Author
Bingrui Sun, University of Calgary
First Author
Bingrui Sun, University of Calgary
We will present several web-based, interactive apps designed to aid in the instruction of two key concepts in machine learning: Classification Metrics and the Bias-Variance Trade-Off. These apps are intended to improve student understanding and engagement with seemingly abstract machine learning concepts. They are designed for students from all backgrounds since they visually explore concepts without requiring any mathematical calculations or coding. Both apps are designed using the Shiny package in R and will be accessible via a web browser for interested audience members. In our lightning presentation, we will demonstrate the app functionality and briefly discuss how they can be used in and outside class.
Presenting Author
Eric Friedlander
First Author
Eric Friedlander
CoAuthor
Abhishek Chakraborty, Lawrence University
Sonification, the use of sound to represent data, is a subject that has remained largely unexplored in the context of traditional data visualizations in recent years. However, given that sound through the form of music is able to elicit an emotional response from humans, it is worth exploring whether sonified data can evoke a similar emotional impact. In our study, we survey approximately two-hundred undergraduate students in introductory statistics courses to determine whether 1) simulated sonified data in the form of boxplots can elicit an emotional response from participants, 2) whether there is a relationship between the emotional response of sonified data and what modal chord the sonified data is played in, and 3) whether there is a difference in this relationship between data that is simulated versus real data with a context. Overall, this study seeks to determine whether the use of sonification as an alternative or complement to visualizations can foster a deeper, more intuitive connection to raw data.
Presenting Author
Donya Behroozi
First Author
Donya Behroozi
CoAuthor(s)
Julia Schedler, California Polytechnic State University, San Luis Obispo
Sinem Demirci, California Polytechnic State University
This poster examines using ChatGPT as a feedback tool for R programming challenges. ChatGPT offers crucial feedback by identifying errors, explaining code behavior, and suggesting optimizations, which enhances problem-solving and accelerates learning. The poster showcases ChatGPT's use in education, research, and industry. It provides real-time error detection, personalized code improvement tips, and contextual explanations of R functions. Its interactive dialogue model supports iterative learning and helps users create high-quality R scripts efficiently. This poster highlights how incorporating ChatGPT into R programming can enhance feedback for novice and expert programmers, setting a new standard for AI-assisted learning and productivity in data science.
Presenting Author
Marcelo Guerra Hahn, Lake Washington Institute of Technology
First Author
Marcelo Guerra Hahn, Lake Washington Institute of Technology
The textile industry, particularly the textile dyeing sector, is widely recognized as an unsustainable activity due to its excessive consumption of water, energy, and chemicals, all of which contribute to severe environmental degradation. Beyond pollution, the textile industry's waste crisis further exacerbates its environmental impact. The environmental damage caused by textiles are caused by multiple factors, including production, transportation, and disposal, all of which contribute to climate change, pollution, and ecosystem degradation. The unsustainable nature of textile production does not only affect ecosystems; it also has significant human health implications. The extensive environmental and human health impacts, necesitates sustainability in the textile industry that requires a structured, and actionable approach.One of the major barrier to sustainability in the industry is the lack of a standardized, publicly accessible database that compares the environmental impact of different textile materials. Most of the existing Life Cycle Assessments (LCAs) provide some insights, but they are often fragmented, making it difficult for companies to make informed sourcing decisions. Furthermore, sustainability metrics must be tailored to different stakeholders . The aim of this research is to develop a comprehensive, publicly accessible LCA data visualization for different type of fibers. This system would serve as a decision making resources for sustainability efforts within the textile industry, allowing decision-makers to:
Compare the environmental impact of different textile fibers based on standardized data.
Track performance over time to assess progress toward sustainability targets.
Validate sustainability claims, enabling brands to communicate transparent and credible information to consumers.
Inform policy decisions, guiding regulations on textile production, dyeing, and waste management.
Presenting Author
Zahra Saki, NC State University
First Author
Zahra Saki, NC State University
CoAuthor(s)
Karen Leonas, North Carolina State University
Melissa Sharp, North Carolina State University
As internet connectivity rapidly advances, safeguarding user privacy has become paramount. DNS over HTTPS (DoH), a novel technology, was created to enhance internet users' privacy protection. DoH encapsulates queries and responses within Hypertext Transfer Protocol Secure (HTTPS) and can replace traditional DNS for domain name resolution. While DoH offers benefits, it also presents challenges. Although it improves user privacy, its encapsulation mechanism complicates detection for enterprises employing conventional methods to monitor network activity and potential threats. This study explores features that effectively represent DoH traffic classification, as these features directly impact the model's classification accuracy. We utilized the publicly available CIRA-CIC-DoHBrw2020 dataset for comparative analysis and experimentation. To determine feature importance, we categorized the dataset's features into two types: those with and without network-specific characteristics. We then developed a One-dimensional Convolutional Neural Network model based on features that accurately represent DoH traffic. The One-dimensional Convolutional Neural Network model, built on the classified features, distinguishes DoH traffic from other network traffic with enhanced precision. We evaluated the proposed method's performance using accuracy metrics, achieving a score of 99.06 accuracy.
Presenting Author
Hussein Abrahim, Zhengzhou University
First Author
Hussein Abrahim, Zhengzhou University
Recent studies link air pollution exposure to health and therefore it is critical to identify spatial-temporal regions where such exposure risks are high. It is known that people and the environment are more adversely affected by excessive levels of pollution, thus the need to visualize and model effects of various quantiles of fine particles, in addition to mean effects. We propose versatile tools to describe and visualize quantiles of complex space-time data with various degrees of missingness and show how dynamic views provide useful insights into where and when the process changes. This approach does not require Gaussianity or stationarity and helps to guide future modeling efforts. We study daily PM2.5 concentrations for the years 2020-2024 collected at 108 locations in the states of New York, New Jersey, and Pennsylvania, illustrate how the PM2.5 exposure risks evolve over space and time and identify possible clusters. This approach demonstrates the importance of effective dynamic visualizations of complex spatial-temporal datasets by providing relevant summaries with their corresponding confidence regions.
Presenting Author
Dana Sylvan, Hunter College, City University of New York
First Author
Dana Sylvan, Hunter College, City University of New York
CoAuthor(s)
Danielle Elterman, City University of New York Hunter College
Peter F Craigmile, City University of New York Hunter College