Description of Module "Record Linkage"

Overview

Data from epidemiological and clinical studies are valuable for investigating the burden, history and prognosis of diseases, the influence of risk factors (aetiology) and the effectiveness of preventive and therapeutic interventions on individual and population level. Linking different secondary/registry data sources with each other on an individual level can create synergies and answer research questions that cannot be addressed by traditional single data source approaches.

However, such record linkage is hampered by the fact that there is neither a general standard for record linkage nor a unique identifier in Germany. Instead, record linkage approaches differ regarding data protection, required applications, data flow, used method, considered identifiers and underlying laws. To overcome this limitation, different approaches have been conducted in Germany, e.g. to ask for informed consent of participants or to use linkage procedures based on indirect identifiers without informed consent. The options for record linkage of health data in Germany each have their specific limitations (March et al. 2018) and quality issues like linkage error (Harron et al. 2017). In addition, health databases (including registries and databases comprising secondary data) are often not interoperable, e.g. due to the large methodological heterogeneity.

However, access to structured health data from registries, administrative health databases, as well as the exchange and linkage of personal health data are crucial. To address this, this module formulates requirements for the identification and classification of record linkage possibilities with data sources. Through specific additions to the core metadata schema introducing the two new resource types “registries” and “secondary data sources” the metadata set has been adapted for data source often used for record linkage.

Example:

A researcher might be missing relevant data for his or her own cancer related research question (e.g. information on a drug associated with cancer), as these could not be collected in his individual studies. He or she knows that the health insurance companies regularly document these data of interest from their customers. A possible way of combing individual personal data could be record linkage, a technique for linking data from the same person from different databases to form a new database for health research purposes. In doing so, the researcher can link both his or her primary data from his or her cancer study to secondary data (i.e. health insurance data) or merge different registry or secondary data (e.g. cancer registry data with health insurance data). The last-mentioned data sources have the advantage that they reflect the real world and are free of non-responder and recall bias. Further examples of record linkage projects and studies can be found here:

  • Dreger et al. (2020)
  • Kollhorst et al. (2022)
  • Siegert et al. (2016)

References

Dreger S, Wollschläger D, Schafft T, Hammer GP, Blettner M & Zeeb H (2020). Cohort study of occupational cosmic radiation dose and cancer mortality in German aircrew, 1960–2014. Occupational and Environmental Medicine 77(5), 285-291. 10.1136/oemed-2019-106165

Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA et al. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol. 2017;46(5):1699-1710.

Kollhorst B, Reinders T, Grill S, Eberle A, Intemann T, Kieschke J et al. (2022). Record linkage of claims and cancer registries data—evaluation of a deterministic linkage approach based on indirect personal identifiers. Pharmacoepidemiol Drug Saf 31, 1287-1293. https://doi.org/10.1002/pds.5545

March S, Antoni M, Kieschke J, Kollhorst B, Maier B, Müller G et al. [Quo vadis data linkage in Germany? An initial inventory]. Gesundheitswesen. 2018;80(3):e20-e31

Siegert Y, Jiang X, Krieg V & Bartholomäus S (2016). Classification-based record linkage with pseudonymized data for epidemiological cancer registries. IEEE Transactions on Multimedia 18(10), 1929-1941.

Authors (with affiliations) of the Epidemiology of Chronic Diseases Module

  • Timm Intemann, Leibniz Institute for Prevention Research and Epidemiology (BIPS), Bremen
  • Manuela Peters, Leibniz Institute for Prevention Research and Epidemiology (BIPS), Bremen