Print Close

Data Fusion: Calculation of Feasible Correlations

Presented During: Non-probability samples, administrative records, and data fusion

Chris Moriarity First Author

Chris Moriarity Presenting Author

Sunday, Aug 3: 2:50 PM - 3:05 PM
1729
Contributed Papers

Music City Center

I have described a method (2001, 2003, 2004, 2009, 2010) for merging two independent samples using data fusion (also known as statistical matching). One sample contains (X,Z) and the other contains (X,Y), both drawn from a common nonsingular normal (X,Y,Z) distribution. Following Kadane (1978) and Rubin (1986), I employ regression in my approach. I assess the uncertainty introduced during the merge that is due to the unobserved (Y,Z) relationship by repetition over a range of (Y,Z) values that are consistent with the observed data. An essential part of my algorithm is to add random residuals to the regression estimates. My initial approach for estimating the residual variance could fail (be negative) because it used subtraction of estimates from both files. An innovation due to Raessler and Kiesl (2009) give improved results for estimating the residual variance, solving one of the two open problems in the paradigm. The remaining open problem was determining the area of feasible correlations between Y and Z when both Y and Z are multivariate. Building on the foundation described by Kiesl and Raessler (2006), the solution to this problem is now known.

Keywords

statistical matching

Main Sponsor

Survey Research Methods Section