Text Analysis for Statisticians: Introduction to Advanced Language Modeling

Karl Pazdernik Instructor
Pacific Northwest National Laboratory
 
Robin Cosbey Instructor
Pacific Northwest National Laboratory
 
Saturday, Aug 3: 8:30 AM - 5:00 PM
CE_03C 
Professional Development Course/CE 
Oregon Convention Center 
Room: A106 
This course will provide a broad overview of text analysis and natural language processing (NLP), including a significant amount of introductory material with extensions to state-of-the-art methods. All aspects of the text analysis pipeline will be covered including data preprocessing, converting text to numeric representations (from simple aggregation methods to more complex embeddings), and training supervised and unsupervised learning methods for standard text-based tasks such as named entity recognition (NER), sentiment analysis, topic modeling, and text generation using Large Language Models (LLMs). The course will alternate between presentations and hands-on exercises in Python. Translations from Python to R will be provided for students more comfortable in that language and support will be given for both Mac and Windows users. Attendees should be familiar with Python (preferably), R, or both and have a basic understanding of statistics and/or machine learning. Attendees will gain the practical skills necessary to begin using text analysis tools for their tasks, an understanding of the strengths and weaknesses of these tools, and an appreciation for the ethical considerations of using these tools in practice.

Main Sponsor

Section on Text Analysis

Co Sponsors

Section on Statistical Learning and Data Science
Section on Statistics in Defense and National Security