Print Close

Artificial Intelligence and Official Statistics: Responsibly Leveraging Large Language Models in Support of Open Data

Presented During: CS005 NISS-FCSM: AI in Federal Government (Pt 1)

Conference: Symposium on Data Science and Statistics (SDSS) 2024

06/05/2024: 1:20 PM - 1:45 PM EDT
Special Event

Description

One of the fundamental responsibilities of a statistical agency is to produce and publicly disseminate relevant, accurate, and credible statistical information. The scale and complexity of some of these data products (file size, number of variables, technical documentation), however, can hinder their direct use by non-technical audiences. Consequently, third parties will often repackage and share that information in myriad ways to make it more accessible and interpretable to the average person. The repackaging of statistical information by non-authoritative sources, however, may impact the integrity of the underlying statistics, calling their accuracy or credibility into question. Emerging technologies like mass-market Large Language Models (LLMs) and other generative artificial intelligence (AI) applications may provide an opportunity for statistical agencies to enhance their ability to disseminate statistics more directly to the average web user, but only if AI can properly and efficiently ingest and interpret the official statistics. The U.S. Department of Commerce, one of the world's largest producers of public data, has assembled a working group to help realize the benefits and mitigate the risks of AI models for finding, linking, and interpreting the Department's data. The goal is to advance dissemination standards for data and statistics from being machine-readable to being machine-understandable, capturing and conveying the information's context, structure, and meaning. This working group is currently drafting technical guidelines for publishing AI-ready open data. The Department of Commerce is interested in engagement from industry, academia, and other partners across the public data ecosystem. We will share the progress of the working group and elicit your feedback.

Keywords

Artificial Intelligence

Large Language Models

Generative AI

National Language Processing

Official Statistics

Open Data

Presenting Author

Sallie Keller, University of Virginia

First Author

Sallie Keller, University of Virginia

CoAuthor(s)

Michael Hawes, U.S. Census Bureau
Kenneth Haase, U.S. Census Bureau

Tracks

Practice and Applications

Symposium on Data Science and Statistics (SDSS) 2024