Econometric Theory, Model Misspecification, and Inference

Pilar Poncela Chair
Universidad Autónoma de Madrid
 
Monday, Aug 4: 8:30 AM - 10:20 AM
4036 
Contributed Papers 
Music City Center 
Room: CC-205C 

Main Sponsor

Business and Economic Statistics Section

Presentations

Mosaic inference on panel data

The analysis of panel data via linear regression is ubiquitous across disciplines. However, standard confidence intervals typically assume that the residuals are cluster-independent. This paper introduces a method called the mosaic permutation test that can be used to (a) test this assumption and (b) weaken it. We elaborate on these contributions below.

Testing: Our method can use flexible machine learning techniques to detect violations of the cluster-independence assumption while exactly controlling false positives under a mild "local exchangeability" condition. To illustrate our method, we survey the literature and assess whether cluster-independence assumptions are accurate.

Inference: Our method produces confidence intervals for linear models that are (i) finite-sample valid under a local exchangeability assumption and (ii) asymptotically valid under the cluster-independence assumption. In short, our method is valid under assumptions that are strictly weaker than classical methods. Experiments on real, randomly selected datasets from the literature show that many existing standard errors are up to ten times too small, whereas mosaic methods produce reliable results. 

Keywords

Panel data

Permutation tests

Linear regression

Semiparametric models

Hypothesis tests 

Co-Author(s)

Rina Barber
Emmanuel Candes, Stanford University

First Author

Asher Spector

Presenting Author

Asher Spector

Pairwise difference representations of moments: Gini and generalized Lagrange identities

We provide pairwise-difference (Gini-type) representations of higher-order central moments for both general random variables and empirical moments. Such representations do not require a measure of location. For third and fourth moments, this yields pairwise-difference representations of skewness and kurtosis coefficients. We show that all central moments possess such representations, so no reference to the mean is needed for moments of any order. This is done by considering i.i.d. replications of the random variables considered, by observing that central moments can be interpreted as covariances between a random variable and powers of the same variable, and by giving recursions which link the pairwise-difference representation of any moment to lower order ones. Numerical summation identities are deduced. Finally, through a similar approach, we give analogues of the Lagrange and Binet-Cauchy identities for general random variables, along with a simple derivation of the classic Cauchy-Schwarz inequality for covariances. 

Keywords

Moments

Covariance

Skewness

Kurtosis

Gini

Lagrange identity 

Co-Author(s)

Abderrahim Taamouti, University of Liverpool Management School
Meilin Tong, McGill University

First Author

Jean-Marie Dufour, McGill University

Presenting Author

Meilin Tong, McGill University

Misspecification in trivariate probit models with recursive structure and sample selection

This paper examines the consequences of model misspecification when data are generated from a trivariate probit model that accounts for recursive dependencies and sample selection. We investigate the estimation bias that arises under different model specifications, either ignoring recursive structures, sample selection, or both. In addition to providing theoretical results, we conduct Monte Carlo simulations to quantify the bias magnitude, not only in the parameters associated with the explanatory variables in the three equations, but also in the correlation parameters of the corresponding error terms. We highlight the risk of misinterpreting these correlation parameters, which could lead to invalid conclusions about the potential presence of selection bias. Our findings emphasize the importance of careful model specification in applications involving multiple binary outcomes, where selection bias and recursive structures can play an important role in shaping the results. 

Keywords

trivariate probit models

sample selection

recursive models

Simulated Maximum Likelihood estimation 

Co-Author

MARCO A. PEREZ-NAVARRO, UNIVERSIDAD AUTÓNOMA DE MADRID

First Author

ROCIO SANCHEZ-MANGAS, UNIVERSIDAD AUTONOMA DE MADRID

Presenting Author

ROCIO SANCHEZ-MANGAS, UNIVERSIDAD AUTONOMA DE MADRID

Estimation and Prediction in Mis-specified Fractionally Integrated Models with an Unknown Mean

This paper investigates how mis-specification of the short memory dynamics affects estimation and prediction in a fractionally integrated model with an unknown mean. We derive the limiting distributions of three parametric estimators, namely exact Whittle, time-domain maximum likelihood, and conditional sum of squares, under common mis-specification of the short memory dynamics. We show that, given a consistent estimator of the mean, these three estimators converge to the same pseudo-true value and their asymptotic distributions are identical to those of the frequency domain maximum likelihood and discrete Whittle estimators, which are mean invariant. We analyze the properties of a linear predictor under mis-specification, demonstrating that it is biased unless the true mean is zero and that mean squared forecast error depends on the true and pseudo-true fractional differencing parameter. To support our theoretical findings, we conduct an extensive numerical exploration of these estimation methods. Our simulations reveal that the DWH estimator performs best in terms of bias and mean squared error and provides superior forecast accuracy when combined with the sample mean. 

Keywords

conditional sum of squares



linear predictor

long memory model

maximum likelihood

mis-specification

pseudo-true value 

Co-Author(s)

Indeewara Perera, University of Sheffield
Donald Poskitt, Australian National University

First Author

Kanchana Nadarajah, University of Sheffield

Presenting Author

Kanchana Nadarajah, University of Sheffield

Design-based weighted regression estimators for conditional spillover effects

In a clustered interference setting, with networks collected within clusters and no interference between clusters, we introduce a general causal estimand for conditional spillover effects, offering flexible ways of integrating unit-to-unit spillover effects. Such estimand enables to access the heterogeneity of a unit's spillover effect on their neighbors with respect to the unit's characteristics. Two weighted regression-based estimators are proposed: i) at the individual level, taking neighbors' averages either in the outcomes or in the treatments within weights; and ii) at the dyadic level, where the outcome of one unit is regressed on the treatment of each neighbor. When covariates driving the heterogeneity are categorical, we prove the equivalence of the two regression-based estimators to the non-parametric Hajek estimator. For continuous covariates, we demonstrate that both estimators consistently estimate the proposed estimands. Under a design-based perspective, we derive HAC variance estimators and establish the central limit theorem. We then apply our methods to a randomized experiment conducted in Honduras to evaluate the spillover effect of a behavioral intervention. 

Keywords

causal inference in networks

design-based causal inference 

Co-Author(s)

Edoardo Airoldi, Temple University
Laura Forastiere, Yale University

First Author

Fei Fang, Yale University

Presenting Author

Fei Fang, Yale University

Model-assisted inference of staggered rollout cluster randomized experiments

Staggered rollout cluster randomized experiments (SR-CREs) are increasingly used for their practical feasibility and logistical convenience. These designs involve staggered treatment adoption across clusters, requiring analysis methods that account for an exhaustive class of dynamic causal effects, anticipation, and non-ignorable cluster-period sizes. Without imposing outcome modeling assumptions, we study regression estimators using individual data, cluster-period averages, and scaled cluster-period totals, with and without covariate adjustment from a design-based perspective, where only the treatment adoption time is random. We establish consistency and asymptotic normality of each regression estimator under a finite-population framework and formally prove that the associated variance estimators are asymptotically conservative in the Lowner ordering. Furthermore, we conduct a unified efficiency comparison of the estimators and provide practical recommendations. We highlight the efficiency advantage of using estimators based on scaled cluster-period totals with covariate adjustment over their counterparts using individual-level data and cluster-period averages. 

Keywords

Covariate adjustment

Causal inference

cluster-robust variance estimator

design-based inference

heteroskedasticity-consistent variance estimator

finite-population central limit theorem 

Co-Author

Fan Li, Yale School of Public Health

First Author

Xinyuan Chen, Mississippi State University

Presenting Author

Xinyuan Chen, Mississippi State University

Reducing False Discovery Rates for A/B Experiments in Google Cloud

Google Cloud uses A/B testing for launch decisions, relying on A/A tests to validate the A/B testing infrastructure. A key metric is initial page load latency, or the amount of time it takes each page to load all elements from start to finish. A series of A/A experiments revealed unexpectedly high false discovery rates (FDR) at the page-path level, even after applying common corrections such as Bonferroni adjustment. Drawing from genomics methodologies, we derived a new significance threshold using permutation tests. We randomly assigned users to "treatment" and "control" groups, calculated p-values for the 75th percentile latency nonparametrically, sorted all p-values, recorded the 1,000 smallest, and repeated this 10,000 times. This yielded the minimum p-value where the cumulative distribution function approached 0.05, returning FDRs to expected levels. We also evaluated the trade-off between significance thresholds and power by injecting hypothetical lifts. This solution was implemented in Google's internal A/B experiment tools. 

Keywords

online A/B experimentation

false discovery rate (FDR)

permutation testing

power analysis

Google Cloud

multiple comparisons 

First Author

Taylor Mattia, Google

Presenting Author

Taylor Mattia, Google