Tuesday, Aug 5: 8:30 AM - 10:20 AM
0342
Invited Paper Session
Music City Center
Room: CC-208B
Aleatoric variability
Artificial intelligence
Data-driven decision
Epistemic variability
Machine learning
Official statistics
Applied
Yes
Main Sponsor
National Institute of Statistical Sciences
Co Sponsors
Government Statistics Section
Presentations
Camera traps are a widespread, non-invasive, cost-effective method to monitor animal populations; researchers using camera traps comprise diverse disciplines and geographies. The time and labour required to manually classify potentially millions of images generated by a single camera array presents a significant challenge. Reducing this burden facilitates implementation of larger, longer-lasting camera trap arrays, resulting in more comprehensive analyses and better decision frameworks. To address this challenge, a multi-agency USDA team has developed CameraTrapDetector - a free, open-source tool that deploys computer vision models at the class, family, and species (Nclasses=63, mAP(50-95)=0.878, F1=0.919) taxonomic levels to detect, classify, and count animals in camera trap images. The tool is available as an R package with a R Shiny interface, a desktop application, or a command-line Python script for easy integration into many analytical pipelines. The tool enables users to retain complete data privacy, and developers maintain a transdisciplinary, multi-institutional working group of camera trap researchers to advance best practices. An iterative training cycle uses state-of-the-art computer vision approaches and adds new images from project partners to train new models and incorporates user feedback and goals into the tool's development. A primary goal, and challenge, for the models is generalization to out of site images; results are less accurate, and more variable, compared to metrics for test (unseen) in-site images. Results on test data (Nclasses=12) show major improvements in generalization from the version 2 model (mAR = 0.195, range 0.07-0.98) to the version 3 model (mAR = 0.606, range 0.05-1.00). Faster, more accurate, more generalizable models allow CameraTrapDetector users to turn raw images into quantifiable data for answering questions about estimating animal presence, population size, and movement. Our open-source pipeline may also be leveraged to train species-specific computer vision models to answer questions about animal behaviour or disease detection. By automating image processing, CameraTrapDetector streamlines research speed and redirects critical human resources to more analytical research tasks.
Keywords
Computer Vision
Deep Learning
Artificial Intelligence
Animal Behavior
Image Processing
An experiment carried out within the 2020 Census evaluated a new training module for bilingual enumerators to aid in administering the census questionnaire to Spanish-speaking households during nonresponse followup. An objective of the experiment was to study the association between enumerator training and response rate. We describe statistical considerations which emerged in several stages of the experiment. Work carried out before the census formulates a test and studies its power to justify the scope of the experiment. Work after the census analyzes data collected in the field. Both stages utilize a multinomial model with continuation-ratio logit link to jointly capture response probabilities over multiple contact attempts. We present the methodology, findings, and lessons from the experiment.
Automated valuation models (AVMs) have gained popularity with the rise of online platforms like Zillow. However, the noisy nature of residential home sales data poses challenges for estimation and prediction methods that assume normality. As a result, some AVMs exhibit significant bias and imprecision during validations. To address this issue, we employed a robust regression method (MM-estimation) combined with a bootstrapping procedure (Stine 1985) to downweigh outliers. This approach yielded unbiased and precise regression estimated residential prices, as demonstrated through k-fold validation. Additionally, this method provides a confidence interval for each residential property, enabling property-level hypothesis testing, which is uncommon for AVMs. The k-fold validation confirms that these confidence intervals exhibit the required level of statistical confidence.
Keywords
Automated valuation models
robust regression method
bootstrapping
Synthetic data generation is a statistical tool used to alter data to enhance the privacy of record-level information while maintaining the distributional properties of the original population. After processing the data, statistical agencies typically apply privacy-enhancing methods for generating privatized summaries that appear in official publications. These official summaries are often generated using model-based adjustments to account for potential issues due to undercoverage, nonresponse, and misclassification with respect to the population of interest. These adjustments can be produced using dual-system (DSE) or triple-system estimation (TSE) models. Moreover, calibration procedures further adjust the weights to produce estimates that meet known population benchmarks. Although the study of total error variability is well-developed for these standard statistical processes, it often disregards privacy mechanisms and related concerns over disclosure risk. In this paper, the use of an algorithm to generate protected microdata is proposed to study the uncertainty of a census under a novel definition of differential privacy. To better understand the properties of the proposed algorithm, real confidential microdata are substituted with altered microdata before typical estimation procedures are performed. This approach is tested on data from the 2022 US Census of Agriculture (including June Area Survey and FSA administrative data as second and third lists) using the level of accuracy, precision, utility and disclosure risk of final statistical summaries, as metrics for comparing the data with and without privatization mechanisms.
Keywords
Capture-recapture models
Census of Agriculture
Disclosure risk
Neural Networks
Triple-system estimation
Variance estimation