ATLAS: An automated association test using probabilistically linked health records with application to genetic studies

Abstract

Objective: Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases with no gold standard mappings between records may provide a more complete picture of patient health and enable novel research studies. To do so, researchers may probabilistically link databases and conduct inference using the linked data. However, previous inference methods for linked data are constrained to specific linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data. Materials and Methods: Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from the linkage process. Next, an estimated effect size is obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining p-values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling. Results: In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world application study, incorporation of linked data-enabled analyses using ATLAS yielded two additional significant associations between rheumatoid arthritis genetic risk score and biomarkers. Discussion The ATLAS weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error induced bias. ATLAS’ threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power. Conclusion: ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.

Publication
Journal of the American Medical Informatics Association, in press
Date