Artificial Intelligence and Official Statistics: Responsibly Leveraging Large Language Models in Support of Open Data
  
  
              
            
               Conference: Symposium on Data Science and Statistics (SDSS) 2024
          
  
   
   
   
   06/05/2024: 1:20 PM  - 1:45 PM  EDT
   
              
               Special Event 
               
   
   
   
   
      
    One of the fundamental responsibilities of a statistical agency is to produce and publicly disseminate relevant, accurate, and credible statistical information. The scale and complexity of some of these data products (file size, number of variables, technical documentation), however, can hinder their direct use by non-technical audiences. Consequently, third parties will often repackage and share that information in myriad ways to make it more accessible and interpretable to the average person. The repackaging of statistical information by non-authoritative sources, however, may impact the integrity of the underlying statistics, calling their accuracy or credibility into question. Emerging technologies like mass-market Large Language Models (LLMs) and other generative artificial intelligence (AI) applications may provide an opportunity for statistical agencies to enhance their ability to disseminate statistics more directly to the average web user, but only if AI can properly and efficiently ingest and interpret the official statistics.  The U.S. Department of Commerce, one of the world's largest producers of public data, has assembled a working group to help realize the benefits and mitigate the risks of AI models for finding, linking, and interpreting the Department's data.  The goal is to advance dissemination standards for data and statistics from being machine-readable to being machine-understandable, capturing and conveying the information's context, structure, and meaning. This working group is currently drafting technical guidelines for publishing AI-ready open data. The Department of Commerce is interested in engagement from industry, academia, and other partners across the public data ecosystem. We will share the progress of the working group and elicit your feedback.
   
         
         Artificial Intelligence
Large Language Models
Generative AI
National Language Processing
Official Statistics
Open Data 
      
      
      
                         
Presenting Author
                         
                Sallie Keller, University of Virginia 
                  
               
                         First Author
                         
                Sallie Keller, University of Virginia 
                  
               
                         CoAuthor(s)
                         
                Michael Hawes, U.S. Census Bureau 
                  
               
                Kenneth Haase, U.S. Census Bureau 
                  
               
      
   
                  Tracks
                  
               Practice and Applications
               
            Symposium on Data Science and Statistics (SDSS) 2024
         
    
   
   
    You have unsaved changes.