UGR Integration

Background

The Unified Genomic Record (UGR) is a key technological component designed to unify patient genomic data into a single, patient-centric record. The UGR is intended to standardise the collection, storage, and sharing of genomic data across the NHS, ensuring that genomic information is interoperable and accessible across all care settings.

The UGR architecture is structured into three key layers:

  1. End User Functionality – Integrating with existing NHS services to facilitate direct interaction with the UGR
  2. Interoperability – Serving as the primary gateway for accessing and sharing genomic data across different healthcare systems.
  3. Data Storage – Implementing a hybrid centralised and federated data storage model, leveraging the Patient Data Manager (PDM) as the central data entry point.

The key problems the UGR aims to address in support of delivering the Genomic Medicine Service are:

  1. Fragmented and Inconsistent Genomic Data Management, by bringing the genomic data into a single genomic record framework and enforcing UK Core FHIR interoperability standards. Access is centralised through APIM and MNS. Centralising genomic data access will help promote standards adoption by suppliers.
  2. Limited Reusability of Genomic Test Reports, by bringing the genomic data into a single genomic record framework and enforcing GA4GH and UK Core FHIR interoperability standards.
  3. Inefficient Data Use for Research and Population Health, by providing a centralised unified point of access for genomic data that is readily available for population health, management information and research, reducing the burden on GLHs and enabling more robust data analysis and insights.
  4. Limited Integration of Genomic Data with Clinical Pathways, by decoupling genomic data from the systems involved with the originating test request and making available for any other provider with other clinical data, enabling more comprehensive patient management and facilitating the use of precision medicine across the NHS.
  5. High Administrative Burden and Operational Inefficiencies, through use of the genomic order management service. Adoption of standards and a unified point of data access will significantly reduce administrative workloads and improving the overall efficiency of genomic services.
  6. Inadequate Data Linkage for Inherited and Rare Diseases, by providing a simple mechanism for securely linking genomic records, respecting patient preferences, and enabling the implementation of targeted, family-based care strategies.
  7. Lack of Centralised System for Identifying Clinical Trial Eligibility, by potentially serving as a centralised virtual repository that enables researchers to identify eligible patient cohorts, enhancing patient access to innovative therapies and supporting the growth of clinical research.
  8. Absence of a centralised access control, by providing an opportunity to streamline access control processes, allowing patient-based policies instead of organisation or system-based policies. With the data decoupled from end-user systems, enforcement of policies at source is assured and includes comprehensive audit and transparency for patients improving confidence and trust.
  9. Inconsistent Access to Pharmacogenomic (PGx) Data Across NHS, by providing the master PGx record for patients and making the data available nationally to any clinical decision support system involved in prescribing.
  10. Inability to Provide Comprehensive Patient Access and Transparency, by centralising access to the genomic record via the APIM. This significantly simplifies the integration of the UGR with the NHS App.

Genomic Data Reuse

One of the key aims of the UGR is to enable reuse of data across different health contexts: Direct Care; Population Health Management; Management Information; and Research. To support each context, slightly different data access approaches are needed.

Genomic data in direct care is mostly used as a diagnostic source. Data is shared using national standards and provider systems directly interact with the UGR via APIs. All direct care data is shared through legitimate access, managed by national RBAC, although data redaction and other minimisation approaches can be built into the API.

Population health management and management information requires access to groups of patient genomic records. This can be achieved in two ways:

  1. Separate API calls to the PDM via APIM per patient record.
  2. Bulk Query API calls to the PDM using GA4GH Beacon API requesting sets of data across multiple patients.

Purpose-Based Access Control (PBAC) is implemented for each patient, respecting their preferences for secondary data sharing. Data returned via the API is pseudonymised using NHS England's national pseudonymisation system or in the clear if used for commissioning purposes.

Using genomic data for research requires request-specific data processing. The UGR utilises Purpose-Based Access Control to ensure legitimacy of all data requests regardless of the clinical context. For research, the request must be supported by a PBAC policy within each patient’s UGR for the ability to query their data as part of a cohort search. The PBAC rules can be defined to be as explicit as required. For example, a patient may permit their data to be used for cancer research, but request to be asked for permission for other types of research. The rules are defined, and patient preferences managed via the NHS App.

The returned data may be modified depending on the PBAC policies and the context:

  • Data minimisation – only return what is necessary for the functionality required.
  • Data redaction – make requester aware of data existing but respond with ‘data redacted’ in fields as defined in the PBAC rules.
  • Pseudonymisation – remove all patient identifiable information and provide an NHS England derived pseudonym for the patient ID.
  • Anonymisation - remove all patient identifiable information and ensure patient identity is unable to be derived from the patient data returned.

Scope

  • Order Management: The UGR will surface all test order requests and reports, and associated meta data provided by the order management services.
  • Digital Genomics Test Directory Services: The UGR may reference this service for data validation of storing test order requests and reports.
  • Specimen Data: The UGR will record sample metadata and reference any sample storage resources used as part of genomic diagnostics.
  • Sequence Data: The UGR will record and provide access to all data created from the DNA sequencing process. This applies to all genomic diagnostic modalities, including whole genome sequencing (WGS).
  • Family Linking: Linking UGR records is a key requirement and is within scope to support inherited disease diagnostics.
  • Purpose Based Access Control (PBAC): The UGR implements purpose-based access control at the data layer. Storing the access control policies and enforcement via data sharing processes are in scope.
  • Secondary uses: Storage and sharing of UGR data for population health management, management information and research.
  • National Genomic Research Library: Providing the structured data and transport into the NGRL from the UGR is in scope. Specific implementations must honour appropriate information governance.

The data output formats in scope are:

  • OMOP
  • FHIR
  • PLCM (TBC)
  • GA4GH

The data transports used to share data, which are in scope, are:


Architecture

Each patient will have a self-contained Unified Genomic Record. A simplified logical model can be expressed as a folder tree. Each folder holds relevant data and metadata.

UGRFolders

The data structure, storage type, and location within each folder may vary depending on the data requirements. The table below is an example of possible content of the UGR.

Folder Data Structure Storage Type Location
Demographics JSON (FHIR) FHIR Repository NHS England
Genomic Test Request JSON (FHIR) FHIR Repository NHS England
Genomic Test Reports JSON (FHIR) FHIR Repository NHS England
Genomic Data BAM, CRAM, VCF Local/Cloud File Store GLH or GEL
Family History JSON (FHIR) FHIR Repository NHS England
Purpose Based Access Controls and Consent JSON (CEDAR/FHIR) FHIR Repository NHS England
Audit Logfile/FHIR Cloud File Store/PARS NHS England

Genomic data is highly reusable, and it is possible to perform new genomic tests upon existing genomic data, e.g. through reanalysis and reinterpretation requests, negating the need to repeat a specimen collection and wet laboratory process. For this reason, test data can reference existing genomic data. The genomic data can be hosted in multiple places, and a FHIR document reference resource can refer to a GA4GH Data Repository Standard (DRS) location.


Data Storage Components

The UGR contains three primary classes of data, each necessitating a distinct approach to data management and storage:

  1. Structured data that captures the details of genomic test orders, sample processing, bioinformatics analyses, test reports and reporting etc. This data is characterised as being complex, highly structured based on HL7 FHIR standard, but low in volume compared to the primary genomic data.
  2. Unstructured data held in other general file formats such as CSV and PDF to support legacy systems incapable of consuming structured data.
  3. The large-scale data generated by DNA genotyping and sequencing technologies, such as the primary sequencing reads (in SAM, BAM or CRAM formats) and derived data such as variant calls (in VCF) and other data produced by bioinformatics analyses.

To accommodate this diversity and maximise functionality, multiple data repositories are utilised rather than a single physical repository. The Diagram below shows the separation components required to support the UGR.

UGRDataStore

Central FHIR Store

It is expected that all FHIR resources will be stored in a national FHIR store, available through RESTful API Access via the national API platorm.

Federated Object Store

Not all genomics-related data is suitable for storage within FHIR repository. To accommodate different data classes, multiple Federated Object Stores can be utilised, leveraging cloud object stores. To ensure consistency, the GA4GH Data Repository Service (DRS) protocol will be adopted as the standard retrieval mechanism for all data accessible via the UGR. The use of DRS enables the UGR to support both centralised and federated storage models, or a combination thereof. DRS URIs stored within FHIR DocumentReference Resources for each patient will serve as the logical identifiers for all files hosted within the Federated Object Stores.

Bulk Query Interface

For population health management and research use cases, the UGR requires the ability to perform queries across larger sets of data. The APIs supporting these queries are typically read-only and may operate asynchronously to facilitate population-level analysis. The Bulk Query Interface component is designed to support these use cases, with the data made available potentially being transformed and reformatted to optimise bulk querying an RDMS source.


FHIR API

Composition

TODO: Add sequence diagram to demonstrate when Composition is created

The main resource type supporting implementation of the UGR will be the Genomics-Composition resource. This resource is used to align with existing Summary Care Record implementations and mirrors the EU Patient Summary guidance, whereby sections are defined for the data categories, which contain references to the data, e.g. Lab reports, Demographics etc.

It is expected the UGR could be represented as a section under a more general patient summary.

The sections included within the UGR are coded using https://fhir.hl7.org.uk/CodeSystem/UKCore-RecordStandardHeadings, as follows:

Section Title Code Entry Resource Type
Patient demographics patient-demographics Patient (may be NHS Identifier if registered on PDS)
Investigations and procedures requested investigations-and-procedures-requested ServiceRequest
Investigation results investigation-results DiagnosticReport (this resource will link off to the various Genomic Data Files and Observations, Note: a separate section for genomic data and observations irrespective of the report/request which generated this is currently being investigated, e.g. for on demand CDS)
Consent for information sharing consent-for-information-sharing Consent
Family history family-history RelatedPerson/FamilyMemberHistory

An example of a UGR record can be found at Composition-UGR-Example

Type and Category

To appropriately categorise the UGR alongside other Compositions and Documents, the type and category SHALL be fixed to the below.

"type": {
  "coding": [
    {
      "system": "http://snomed.info/sct",
      "code": "824321000000109",
      "display": "Summary record"
    }
  ]
},
"category": [
  {
    "coding": [
      {
        "system": "http://snomed.info/sct",
        "code": "321401000000106",
        "display": "Genomics"
      }
    ]
  }
],

Author and Custodian

As the UGR is created and managed by NHS England, the author and custodian elements will be fixed to the X26 ODS code.

"author": [
  {
    "identifier": {
      "system": "https://fhir.nhs.uk/Id/ods-organization-code",
      "value": "X26"
    }
  }
],
"custodian": {
  "identifier": {
    "system": "https://fhir.nhs.uk/Id/ods-organization-code",
    "value": "X26"
  }
},

Section

To better conform to the EU Patient Summary (EPS) Implementation Guide, section.text has been added to provide a human readable/HTML representation of the UGR sections. an example is provided below.

"section": [
  {
    "title": "Patient demographics",
    "code": {
      "coding": [
        {
          "system": "https://fhir.hl7.org.uk/CodeSystem/UKCore-RecordStandardHeadings",
          "code": "patient-demographics",
          "display": "Patient demographics"
        }
      ]
    },
    "text": {
      "status": "generated",
      "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\">Pheobe Smitham, Female, DOB: 2013-09-27</div>"
    },
    "entry": [
      {
        "identifier": {
          "system": "https://fhir.nhs.uk/Id/nhs-number",
          "value": "9449307539"
        }
      }
    ]
  },

Other Fixed Elements

  • status SHALL have a fixed value of final
  • title SHALL have a fixed value of Unified Genomic Record Summary
  • confidentiality SHALL have a fixed value of R due to the sensitivity of the data within the UGR

RelatedPerson

UGRs for family members are linked using Genomics-RelatedPerson resources, following the same profiling used by the Genomic Order Management Service. Note: The Order Management use case assumes family relationships are asserted/validated by the requesting clinician, however analysis is needed to determine whether UGR familial relationships can be asserted and approved by the patient/related person themselves

Family Members Without a UGR

FamilyMemberHistory resources MAY additionally be used to link clinical family history for individuals who do not have a UGR, such as deceased family members, those who live abroad, or individuals whose identity cannot be confirmed. When used FamilyMemberHistory resources SHALL conform to the Genomics-FamilyMemberHistory profile guidance.

The PersonalRelationship resource, introduced in FHIR R5, is simlar to the RelatedPerson resource as it provides a mechanism for expressing relationships between two individuals. However, unlike RelatedPerson, it does not represent an individual as a standalone clinical entity and instead models the relationship only. Although conceptually appealing for linking UGR records, the resource is of low maturity and not part of the normative content in R6. As such, PersonalRelationship SHALL NOT be used in the context of the UGR until further notice.

The GA4GH Pedigree standard provides a data model for representing family health history. This is equivalent to the Genomics use of the FamilyMemberHistory resource, with caveats around the FMH resource being limited to pairs of individuals rather than complete family units.


Data Access and Search Guidance

The Unified Genomic Record (UGR) is exposed as a FHIR Composition-based summary, structured into sections (e.g., demographics, investigations requested, results, consent, family history).

Direct care

For direct care, the UGR supports record retrieval only for a known patient. Patient discovery is out of scope and MUST be performed externally via PDS. API clients MUST supply an NHS Number or a local identifier (non-NHS Patient) previously registered via the Central Genomic Order Management System.

Supported search pattern:

Option A
Clients SHALL retrieve the UGR via the Composition resource and then resolve each section.entry reference as needed. Required SearchParameters

  • Composition.subject:identifier (NHS Number or local identifier)
  • Composition.status=final
  • Composition.type=824321000000109 (Summary record)
  • Composition.category=321401000000106 (Genomics)

Example:

GET /Composition?subject:identifier=https://fhir.nhs.uk/Id/nhs-number|9449307539&type=http://snomed.info/sct|824321000000109&category=http://snomed.info/sct|321401000000106&status=final&_sort=-date&_count=1


Option B
Generate a complete UGR Document Bundle via Composition/$document operation.

Cohort / Population Search

For population health, management information, and research, cohort  access SHALL use bulk query APIs, separate from the direct care endpoints. Searches MAY be performed using cohort-level criteria (e.g. genomic variants, conditions), and results SHALL be pseudonymised or anonymised and meet minimum cohort size thresholds to prevent re-identification. The minimum threshold is under review, TBC.

To protect confidentiality,, queries that risk identifying individuals (e.g. highly specific rare disease searches scoped to small geographic areas) SHALL be rejected. E.g. You cannot search for a patient in a single post-code with a rare disease.

All queries MUST operate under Purpose-Based Access Control (PBAC), with explicit declaration of purpose (e.g. research, population health).