Bayesian Propagation of Record Linkage Uncertainty into Subsequent Analyses


Record linkage is often performed with the final goal of carrying out statistical analyses, and therefore the creation of a merged dataset is merely an intermediate step. Carrying out these analyses based on a single merged datafile can lead to overconfident results, when the uncertainty from the linkage is ignored. Probabilistic record linkage procedures can lead to errors in the merged datasets, and therefore we should acknowledge their impact on subsequent analyses. We propose a procedure in which analysts carry out the analysis they are interested in for each of several plausible linkages, and then combine the output from these analyses as a weighted average. This procedure is principled in the sense that it approximates the results of a proper joint model where the linkage and the data analysis are done simultaneously. The proposed approach has several practical advantages, including facilitating analyses of confidential data.