Thursday, Aug 7: 8:30 AM - 10:20 AM
4216
Contributed Papers
Music City Center
Room: CC-106A
This session highlights novel statistical approaches to understanding inequality, social networks, and historical data. Presentations will explore alternative measures of inequality that challenge traditional metrics like the Gini coefficient, the role of probability distributions in debunking claims about the benefits of income inequality, and a federal agency's commitment to open science. Additional talks will examine the limitations of household-based social network analysis, Bayesian methods for linking historical records of enslaved individuals, and innovative statistical models for complex networks and spatially dependent data. Together, these studies push the boundaries of statistical methodology to uncover deeper insights into social and economic structures.
Main Sponsor
Social Statistics Section
Presentations
Probabilistic record linkage is an efficient method to connect records from the same entity across data sources without reliable identifiers. Commonly, variation present in the data is due to circumstance rather than error. For example, nicknames can be used in certain contexts rather than proper names. A record with non-erroneous variation tells one part of a greater story. We call such a record an "alias" of the entity from which it is derived. Entities with multiple aliases provide richer information to link entities, but the increased complexity requires a careful approach. Existing record linkage approaches use pre- or post-hoc methods to prevent conflicts due to aliases, which can lead to additional bias and an inability to quantify uncertainty. Instead of forcing the data to fit existing models, we propose a model to fit the data. Our fully Bayesian approach accounts for known aliases in the data and requires no post-hoc processing of link estimates, maintaining uncertainty quantification. We demonstrate the accuracy of our model and apply it to linking historical records of African Americans trafficked in the coastwise slave trade.
Keywords
record linkage
Bayesian inference
Historical data
uncertainty quantification
aliased data
This talk describes two transforms of the Lorenz curve and related measures that are easy to understand and relate to older measures focusing on the lower and upper portions of the distribution, respectively. Reanalysis of US income data for 1993-2022 demonstrates that these measures are highly correlated with currently produced ones. Moreover, they can accommodate negative values, which occur in about one percent of income data and five to ten percent of wealth data. The U.S. Bureau of the Census regularly publishes measures, such as the mean log-deviation, but does not describe the procedures it uses. If the Bureau deletes negative incomes, it will under-estimate inequality. Applying the alternative measures to wealth data from the USA and UK indicates that wealth inequality in both nations has increased more than the corresponding percentage increase in the Gini coefficient. These results imply that policy makers may not appreciate the degree to which income and wealth inequality has increased in the last thirty years if they rely on trends in the Gini coefficient.
Keywords
alternative measures of inequality
increase in income and wealth inequality
Gini coefficient
Standard statistical models for network structure (prominently including exponential-family random graph models, or ERGMs) begin with an exogenously specified vertex set, and posit probability distributions for the edge set conditional on the vertex set. In emergent networks in demographic exchange with their environments, however, the joint distribution of network size, composition, and structure are of potential interest. Here, we introduce a family of ERGMs with support on the set of graphs of arbitrary order, allowing for endogenous modeling of the vertex set. We also provide extensions to vertices with discrete-valued covariates, as well as a Markov-chain Monte Carlo scheme for simulating draws from both the homogeneous and inhomogeneous network distributions. Approximate likelihood-based inference using contrastive divergence and simulation-based adequacy checking are also discussed.
Keywords
Exponential Family Random Graph Models (ERGMs)
Networks
Random Graphs
Markov Chain Monte Carlo
Discrete Exponential Families
Relational Data
First Author
Carter Butts, University of California-Irvine
Presenting Author
Carter Butts, University of California-Irvine
This paper examines some of the ways that probability distributions (such as the exponential, lognormal, and Pareto families) advance knowledge across disciplines and topical domains, focusing on the social sciences. The paper begins with the titular case – how probability distributions expanded the meaning of inequality from inequality between persons to inequality between subgroups, thereby undermining the case for the beneficial effects of income inequality. For while it may have been straightforward to defend the beneficial incentive effects of inequality between persons it is a different matter entirely to defend inequality between subgroups. One key element in this evolution was the increasing use of probability distributions (such as the exponential, lognormal, and Pareto families), which made visible and inescapable a tight link between the two types of inequality. The paper then turns to four further applications in which probability distributions reveal new aspects of sociobehavioral phenomena, showing how inequality in ordinal characteristics differs from inequality in cardinal characteristics (for example, the Gini coefficient is constant), assessing new candidates for inequality measures (illustrating with the P90/P10 ratio and its sibling quantile ratios), showing how theoretical predictions differ across different distributional families (for example, for proportions integrationist and segregationist), and discerning in empirical data around the world how people form ideas of the just job income for themselves (for example, whether they fix on a constant or a multiple or compare to everyone).
Keywords: Inequality between persons and inequality between subgroups; inequality measurement; probability distributions; amounts and ranks; justice, status, power; just reward scenarios; Coleman Box
Keywords
inequality between persons and inequality between subgroups
inequality measurement
probability distributions
amounts and ranks
comparison, status, power
just reward scenarios
We are in a time of increased action to promote open science. Open scholarship, code, and data policies promote transparency by making research data, methods, and results readily accessible to a wider audience. Efforts within the federal system are underway to advance the principles of open science and federal agencies are implementing policies to increase public access to the results of scientific research and identifying ways to enhance best practices for the preservation, discoverability, accessibility, and utility of research outputs. Federal statistical agencies are not as far along as other federal agencies in implementing open science practices. But they also face their own hurdles. This presentation will recap key discussions and outstanding questions around open science for federal statistical agencies. I will contextualize this with the development of Open Census, a new initiative at the US Census Bureau aimed at fostering open science practices in its research. Open Census will develop an intuitive and secure ecosystem for Census Bureau researchers to develop, publish, and disseminate their work, driving the highest standards of research integrity and transparency.
Keywords
Federal Statistics
Open Science
Open Source
Open Data
The item response theory model (IRT) is the benchmark method for modeling individual response differences in survey data. For instance, in ecological data, it can assess how well individuals perform in species identification, taking into account both the difficulty of identifying specific species and environmental variables. In that regard, the three parameter item response model (U)sing (S)patially dependent item difficulties (3PLUS) provides a methodological approach that accounts for spatial dependencies in citizen science data while measuring users' abilities and item characteristics. Our contribution extends the 3PLUS model in two dimensions. First, we generalize the model to handle polytomous responses, expanding its applicability beyond binary outcomes. Second, we introduce Gaussian Process modeling for difficulty parameters, providing more flexibility in modeling spatial dependencies compared to the original conditional autoregressive prior specification. Through simulations and application to ecological citizen science data, we demonstrate more precise inference of item difficulties and participant abilities than the 3PLUS model.
Keywords
Item response theory model
spatial dependency
Gaussian process
Latent variable modeling
Large-scale data
ecological data
Connections between people in communities are often collected and analyzed as either networks of individuals or networks of households. These two networks can differ in substantial ways. The methodological choice of which network to study is an important aspect of study design and data analysis. In this work we consider key differences between household and individual social network structure and ways in which the networks cannot be used interchangeably. We formalize the choices for representing each network and explore how social network analysis depends on these choices. We propose a systematic approach to determine the relevant network representation to study by assessing a series of entitativity criteria. We relate these criteria to theories and observations about household social dynamics and how they are affected by power structures and gender roles. We invoke the definition of an illusion of entitativity to classify when a household network does not satisfy these criteria in an experimental context. Given the widespread use of social network data for studying communities, there is broad impact in understanding which network to study and the consequences of that decision.
Keywords
Social networks
network science
study design