Future of Statistics and Data Science in the Era of ChatGPT and Beyond

Abstract Number:

1036 

Submission Type:

Invited Panel Session 

Participants:

Himel Mallick (1), Frauke Kreuter (2), James Zou (3), Xijin Ge (4), Frauke Kreuter (2), Anna-Carolina Haensch (5), Federico Ferrari (2), Himel Mallick (1)

Institutions:

(1) Cornell University, N/A, (2) N/A, N/A, (3) Stanford University, N/A, (4) South Dakota State University, United States, (5) LMU Munich, United States

Chair:

Himel Mallick  
Cornell University

Co-Organizer:

Frauke Kreuter  
N/A

Panelist(s):

James Zou  
Stanford University
Xijin Ge  
South Dakota State University
Frauke Kreuter  
N/A
Anna-Carolina Haensch  
LMU Munich
Federico Ferrari  
N/A

Session Organizer:

Himel Mallick  
Cornell University

Session Description:

Since ChatGPT's inception in November 2022, we have entered a new era of AI-powered technologies that are revolutionizing every aspect of our lives. With the ability to source, collect, and analyze data, ChatGPT can create content, answer questions, and even write code with lightning-fast speed, providing personalized insights and recommendations in real-time. As AI technology advances, more and more people are recognizing the potential long-term consequences, ethical implications, and privacy concerns of ChatGPT and related Generative AI (GAI) technologies and large language models (LLMs), which are poised to have a significant impact on our society. These revolutionary changes are affecting every line of business, and we are only beginning to understand the full extent of their impact.

As an example, ChatGPT's ability to generate introduction and abstract sections for scientific articles has raised ethical questions. Some scientific journals have even banned the use of ChatGPT as a co-author, as several papers have already listed it as such. In the healthcare industry, the potential uses and concerns surrounding ChatGPT are currently under scrutiny by professional associations and practitioners. Likewise, in the education sector, there are concerns about students using ChatGPT to outsource their writing assignments, as well as the risk of unintentional plagiarism resulting from the use of an AI tool that may output biased or nonsensical text.

These ethical concerns have also extended to the field of statistics and data science, where the use of ChatGPT and other AI tools can potentially amplify biases in data analysis and decision-making. As the use of AI in various industries becomes more prevalent, it is crucial to address these ethical implications and work towards developing responsible AI practices. In this late-breaking panel, we will delve into the background of ChatGPT and examine its impact on society as a whole, with a particular focus on its implications for the future of statistics and data science. Additionally, we will explore the potential for ChatGPT to enhance efficiency in our work processes.

Our aim is to gain a deeper understanding of the benefits and potential risks associated with ChatGPT's impact on the field of data science by analyzing its various applications. In this panel discussion, we intend to explore how ChatGPT can be leveraged to optimize efficiency and productivity while ensuring ethical and responsible use. By doing so, we hope to identify areas where this rapidly developing technology can be most effectively utilized for the greater good.

The panelists have diverse experiences: Dr. James Zhou recently published several papers on ChatGPT both from the point of view of its' performance over time and also from its' role in data science education. Dr. Xijin Ge is the foubnder of the RTutor.ai which uses ChatGPT as the backend which translates natural language into R code, which is then excuted. Drs. Frauke Kreuter and Anna-Carolina Haensch have together published a paper on the impact of ChatGPT in a classroom setting. Finally, Dr. Federico Ferrari is currently leading a working group around LLMs internally at Merck. As a result, the viewpoint of both academic and industry will be covered in this panel discussion which will not only focus on the benefits and potential risks of ChatGPT but also address a combination of pre-specified and live questions.

Sponsors:

International Indian Statistical Association 3
Section on Statistical Learning and Data Science 2
Section on Statistics and Data Science Education 1

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Extra Large (>275)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand