Wednesday, Aug 6: 8:30 AM - 10:20 AM
4139
Contributed Papers
Music City Center
Room: CC-205C
Main Sponsor
Government Statistics Section
Presentations
AI-powered chatbots are promising tools for the public sector. They can make official statistics more findable, accessible, and interpretable for more constituents. However, AI chatbots also create risk. They can return incorrect or misleading information, and they are often vulnerable to misuse. To date, most statistical agencies have viewed this risk-reward tradeoff as unacceptable. In this work, we present a toolkit for reducing risk from AI chatbots. We use the motivating example of a chatbot enabling interaction with the findings of a large, federal survey. We discuss six risk reduction tools: 1) system guardrails, 2) Q&A interfaces, 3) open-source models, 4) extractive responses, 5) system validation, and 6) red-teaming. These tools help prevent misuse, reduce the likelihood of incorrect or misleading responses, and avoid privacy concerns. We conclude with a roadmap for AI chatbot implementation following a responsible AI framework.
Keywords
AI
Chatbot
LLM
RAG
Responsible AI
AI-ready data
The USDA's National Agricultural Statistics Service (NASS) produces over 400 reports on virtually all facets of U.S. agriculture each year. To produce official statistics, NASS relies on hundreds of software programs to draw samples, analyze data, and develop estimates. Various programming languages have been used over time, some of which are expensive, unsupported, or no longer recommended. As part of its modernization process, NASS is converting legacy production code into other languages, a time-consuming effort. Generative artificial intelligence (GenAI) may be able to assist in the timely and accurate conversion of existing programs into contemporary freeware languages. A series of studies has been conducted to assess the effectiveness of certain GenAI tools in assisting with technical tasks. Resulting enhancements to the productivity, efficiency, and quality of code development within the USDA's technology and research productions are evaluated. Compliance with USDA security protocols and related privacy requirements when using such programs is also discussed. This research supports the creation of a set of informed principles for responsible use of GenAI within the USDA.
Keywords
generative A.I.
code conversion
productivity
data security
Co-Author
Linda Young, Young Statistical Consulting LLC
First Author
Alex Tarter, National Agricultural Statistics Service
Accurate coding of federal survey write-in responses to standardized concept lists is essential for incorporation of these responses into downstream statistical products. However, coding by a trained specialist is resource intensive. We examine automated coding of write-in responses to race and ethnicity questions on the United States decennial census to over 1,600 standardized concept codes using artificial intelligence and machine learning (AI/ML) techniques. Since any subset of codes may be assigned to a response, we format the task as a multilabel classification problem. We benchmark fuzzy lookups, classical machine learning and transformer-based classifiers for the coder model and evaluate on the response and code level. To facilitate automation, we train a second ML model to generate a probability that the predicted codes are an exact match to codes that would have been assigned by a residual coder. Performance is evaluated with both intrinsic (e.g., F1 score) and extrinsic (e.g., simulation) metrics. Overall, AI/ML methods show potential for automated coding of race and ethnicity write-in responses in federal surveys.
Keywords
Artificial intelligence
Machine learning
Automated coding
Federal surveys
Multilabel learning
The 8th Edition of Principles and Practices for a Federal Statistical Agency supports the essential role of relevant, credible, trusted, independent, and innovative government statistics. Since 1992, this report has described the characteristics of effective federal statistical agencies. Government statistics are widely used to inform decisions by policymakers, program administrators, businesses and other organizations, as well as households and the general public.
Principles and Practices is a concise tool to communicate the unique responsibilities of federal statistical agencies. It underscores the invaluable role that relevant, timely, accurate, and trustworthy government statistics play to inform the public and policymakers. Since 2001, an updated edition is released at the beginning of each presidential term.
This eighth edition retains the five principles and ten practices established in prior editions, including updated examples and extensive appendices to reflect the many and varied changes across the national statistical system that have occurred since the passage of the Foundations for Evidence-Based Policymaking Act of 2018 and the CHIPS and Science Act.
Keywords
federal statistical agencies
principles
practices
statistical policy
Co-Author(s)
Melissa Chiu, National Academies of Sciences, Engineering, and Medicine
Jennifer Park, National Academy of Sciences, Engineering, and Medicine
First Author
Katharine Abraham, University of Maryland
Presenting Author
Jennifer Park, National Academy of Sciences, Engineering, and Medicine