New item

Business Context > Business Rules

Business Rules

Matching Algorithms

The PPR uses a deterministic and probabilistic algorithm to link distinct records for a given Provider Person or Location received from different sources, under a single PPR assigned Enterprise Provider Identifier (EPID). PPR also uses both deterministic and probabilistic algorithms for identifying candidates to match criteria provided by the PPR consumer for PPR queries. Deterministic algorithms are used generally for querying identifiers. The Probabilistic algorithm is generally used for identifying candidates where the PPR consumer has provided using demographic data as the query criteria (e.g. Name/Address/Role etc.)

Overview

The PPR linking and matching algorithm:

  1. Optimizes demographic and identifier data through data normalization for statistical comparison

  2. Finds all potential matches using threshold analysis (e.g. uses real data to define weights associated with the attributes and values)

  3. Scores via deterministic and probabilistic algorithm (refer to the sub-sections below)

  4. Returns the candidates in descending score order (e.g. highest to lowest)

Deterministic Scoring

Deterministic scoring takes into the account the presence or absence of a value to derive a matching score against the PPR. Holistically, it represents an ‘all or nothing’ static score based on the value in question matching the value in PPR exactly.

PPR performs ‘exact match’ deterministic scoring (for linking records on add/update and identifying candidates on queries) on the following attributes (which, for matching, must be consistently supplied across different sources for the same provider to be used in the linking or matching score):

  1. UPI (Person & Location)

  2. License Number (Person)

  3. Stakeholder Number (Person & Location)

  4. Laboratory Services License Number (Location)

  5. Pharmacy Accreditation Number (Location)

  6. Facility Number (Location)

  7. Master Number (Location)

Probabilistic Scoring

PPR performs probabilistic scoring (often referred to as ‘fuzzy’ matching) for matching candidates on queries. The following string value attributes, which must be consistently supplied across different sources for providers to be used in the linking or matching score, are expected to be supplied across all source records for providers, and are used for probabilistic scoring. The engine normalizes the data (e.g. dashes, area codes in phone numbers, and special characters in names are stripped for comparative purposes) for comparison and scoring

  1.   Name
           1. Person - FirstName, MiddleName, LastName, AliasFirstName, AliasMiddleName, AliasLastName)
          2. Location - LegalName, Location Known Name

  2.   Address (Person & Location) including:
        1. Street Address
        2. Municipality (Person & Location)
        3. Telephone (Person & Location)

Probabilistic scoring takes into consideration the frequency of the occurrence of a data value within a particular distribution of an attribute when matching against the existing data PPR. Holistically, it represents a variable match score based on the value in question matching the value in PPR to a measured degree, as per the following example criteria and thresholds:

Table - Probabilistic Scoring

Concepts Description
Frequency Scores based on the frequency of an attribute value and not just whether there is a match (e.g. “John” would score less than “Heinrich” because John is a more common name in our jurisdiction).
Phonetics Indexes words by sound/pronunciation to identify those that are similar phonetically (e.g. Purdey, Purdie, Purdy)
Equivalence (Nicknames) Evaluates based on equivalency (e.g. for names; Liz, Libby, Beth = Elizabeth, for municipalities; Agincourt = Scarborough)
Field Transposition Accounts for errors where name (First/Middle/Last) or address (StLine1/Stline2) sub-elements are transposed into the incorrect fields (e.g. Last + First vs. First + Last)
Edit Distance Accounts for typos due to character transposition at the attribute value level (e.g. “Micheal”)
Historical Values Performs queries against now inactive source data (e.g. Name, Address, Phone) to match on potential candidates based on historical data (e.g. name change as a result of marriage)
False Positive Filters Prevents improper linkages from occurring when certain conditions are met (e.g. father & son)
Trusted Source Identifies the regulatory colleges as trusted sources that contain authoritative source data attributes
OR vs. AND Conditions In general, PPR treats additional criterion as “OR” conditions, as opposed to “AND” conditions as the goal of the MDM algorithm is to find a suitable match. Additional attributes provided in a query casts a wider search net, with the highest scoring candidates returned at the top of the candidate list (i.e. in descending order). Refer to the Minimum Score and Maximum Results sections for additional context.

For candidate search queries, PPR totals the deterministic and probabilistic search scores together to generate the candidate confidence score. Note that negative scores may be configured for certain elements. Based on the degree of mismatch for the value, the negative score is applied to reduce the overall match score.

Refer to the Record Linking section for scoring scenarios.

Record Linking

PPR uses its deterministic and probabilistic scoring algorithm and record linking thresholds to determine whether a contributed record should be linked to an existing entity (and assigned that entity’s EPID) or not.

The question of “Does this record match an existing entity?” can be answered with Yes, No or Maybe. In MDM, there is a defined range of scores for identifying whether the answer should be Yes, No or Maybe. These are called the Thresholds, and there are 2 – “Clerical Review” and “auto-link”… These questions are answered based on the following:

        No = Match Score < Clerical Review minimum Score

        Maybe = Clerical Review Threshold <= Match Score < Autolink Threshold 

        Yes = Match Score >= AutoLink Threshold

These scenarios are dictated by a linkage threshold scoring range for the member record’s total match score (deterministic + probabilistic scores) when compared against all the other member records in the PPR. Scoring is attached for every record added or updated in the PPR. Based on the match score, one of the following scenarios will occur:

  1. No Linkage (No Match): Leaves a record in its existing singleton entity based on a ‘low’ matching score below the “Clerical Review” threshold. The demographics on the record are ‘thin’ or ‘sufficiently different’ enough that PPR cannot auto-link or potentially link it to an existing person entity.

  2. Potential Linkage (Maybe Match): Flags a record by creating a “Potential Linkage” task based on a ‘medium’ matching score above the “Clerical Review” and below “Auto-Link” thresholds. The record in question retains its current entity ID value, either with other linked records or on its own as a singleton, until the Potential Linkage task is manually resolved by a Data Integrity analyst. The data integrity analyst will either link the record to another existing person entity, or leaving it as-is based on a defined investigation process to determine if the two records are in-fact for the same provider.

  3. Automatic Linkage (Yes Match): Automatically links a record to an existing entity based on a ‘high’ matching score e.g. above the “Automatic Link” threshold.

Supported Queries

The PPR uses the following rules to rank possible matches based on query parameters received from consumers within the PPR.

Search Provider Person

PPR supports searching for Practitioner candidates using Practitioner demographics as defined in Table 11 below. In each case:

• A maximum of 25 search results (candidate practitioners) will be returned for a single Practitioner EMPI Match request

Typically, consumers with limited demographic information (e.g. name and city) available will:

  1. Submit a Practitioner EMPI Match to return a list of potential matching Practitioners from the PPR.

  2. Use a Unique Provider Identifier (UPI) or another identifier (e.g. Licence Number) to submit a Practitioner Search to retrieve the full details for the desired Practitioner.
    Note that query criteria values provided in the message are ‘ORed’ together by the PPR and the score calculated by the matching algorithm defined above to identify the list of candidate Practitioners.

Table-Search Provider Person Query Criteria

FHIR Operation Query Criteria Use
Practitioner EMPI Match – Option 1 Name & Municipality Search Provider Last Name Mandatory
Provider First Name Optional
Profession code Optional
Profession Status Optional
Specialty /Sub specialty code Optional
Address (Municipality) Mandatory
Provider Language Optional
Practitioner EMPI Match –Option 2 Profession & Municipality Search Profession code Conditional: One of Profession OR Specialty Must be provided at a minimum, however both MAY be provided
Profession Status Optional
Specialty /Sub specialty code Conditional: One of Profession OR Specialty Must be provided at a minimum, however both MAY be provided
Gender Optional
Address (Municipality) Mandatory
Provider Language Code Optional

Get Provider Person Details

PPR supports retrieving the details of a specific practitioner by a defined Practitioner ID as defined in Table 12 below. If the Practitioner UPI/Practitioner Identifier (e.g. Licence Number) in question is not available in the local consumer system, the consumer can execute a Practitioner EMPI Match to query on various criteria to return the UPI/Practitioner Identifier of the provider.

If the UPI/Practitioner Identifier is known, the consumer system can directly execute any of the queries below:

Table – Get Provider Person Details Query Criteria

FHIR Operation Query Criteria Use
Practitioner Search –Option 1 UPI
ID Issuer (i.e. eHealth Ontario) Mandatory
Unique Identifier Value (i.e. UPI) Mandatory
Practitioner Search –Option 2 Licence Number
ID Issuer (i.e. Regulatory College) Mandatory
Provider Person Primary Identifier Value (i.e. License Number) Mandatory
Practitioner Search –Option 3 EPID
ID Issuer (i.e. eHealth Ontario) Mandatory
EPID Value Mandatory
Practitioner Read ID Issuer (i.e. eHealth Ontario) Mandatory
EPID Value Mandatory

Search Provider Location

PPR supports searching for provider location candidates using location demographic information as defined in Table below. In each case:

• A maximum of 100 search results (candidate providers) will be returned for Location EMPI Match

Typically, consumers with limited information available will:

  1. Submit an Location EMPI Match query to return a list of organizations/locations from the PPR

  2. Use a Unique Identifier (UPI) from the search results to submit an Location Search details query to retrieve the desired location.

Note that query criteria values provided in the message are ‘ORed’ together and scored by the PPR as per the algorithm definition above to identify and rank the candidate providers.

Table - Search Provider Location Query Criteria

Search Option Query Criteria Use
Location EMPI Match –Option 1 Name & Address(Municipality)
Location Name Mandatory
Location Address(Municipality) Mandatory
Location Postal Code Optional
Location Status Optional
Location Phone Optional
Location Fax Optional
Location Address ( lines) Optional
Location EMPI Match –Option 2 Role & Address(Municipality) Search
Location Role Mandatory
Location Name Optional
Location Address(Municipality) Mandatory
Location Address(Postal Code) Optional
Location Status Optional
Location Phone Optional
Location Fax Optional
Location Address Optional
Location EMPI Match –Option 3 LHIN Code & Role Search
LHIN Code Mandatory
Location Role Mandatory
Location Status Optional

Get Provider Location Details

PPR supports retrieving a specific Location as defined in Table below. If the Location UPI/ Location Identifier in question is not available in the local consumer system, the consumer can execute a Provider Location EMPI Match query to on various criteria to return the UPI/Provider Location Identifier of the provider.

If the UPI/Location Identifier listed below are available, the consumer system can execute the Location Search using any of the following criteria:

Table - Get Provider Location Details Query Criteria

FHIR Operation Query Criteria Use
Location Search –Option 1 UPI
ID Issuer (i.e. eHealth Ontario) Mandatory
Provider Location Primary Identifier Value (i.e. UPI) Mandatory
Location Search –Option 2
Laboratory Service License Number
ID Issuer (i.e. eHealth Ontario) Mandatory
Provider Location Primary Identifier Value (i.e. Laboratory Services License Number) Mandatory
Location Search – Option 3 Pharmacy Accreditation Number
ID Issuer (i.e. eHealth Ontario) Mandatory
Provider Location Primary Identifier Value (i.e. Pharmacy Accreditation Number) Mandatory
Location Search – Option 4 EPID
ID Issuer (i.e. eHealth Ontario) Mandatory
EPID Value Mandatory
Location Read
ID Issuer (i.e. eHealth Ontario) Mandatory
EPID Value Mandatory

Practitioner Queries

Overview

Consuming systems can query for Practitioners in the PPR via:

  1. Get Queries: PPR EPID or Definitional Identifier (e.g. License Number) to ‘get’ the current demographic information and IDs associated with the provider.

  2. Search Queries: Demographics (e.g. Last Name + Municipality) to ‘search’ for candidate providers, returning their current demographics and ID list for each candidate (to support selection of the appropriate candidate from a ‘pick list’).

The following sections detail PPR get vs. search queries based on input criteria and output results. Note that output data in the PPR is dependent on the data density for a given provider as contributed across all sources of practitioner information at the attribute level.

Get Query

Retrieve all demographics and identifiers for a given provider by definitional identifier.

Table - Rules: Get Query

Summary Get by Provider ID (Definitional ID or EPID) to retrieve all demographics and IDs for a given provider.
Operation HL7 FHIR – Practitioner Read (by EPID) or;
HL7 FHIR – Practitioner Search (by Primary Identifier UPI or EPID)
Input (Criteria) A primary ID Issuer & corresponding ID value must be provided in the request.
For Practitioner Read:
1. PPR Enterprise Provider ID (EPID)
For Practitioner Search:
1. PPR Enterprise Provider ID (EPID), or
2. Authoritative Source Identifier (e.g. College Source URI+ License Number) or
3. PPR Unique Provider Identifier (eHealth Source URI + UPI)
Note: If a given adopter only has demographic information available for the provider, they must perform a Practitioner EMPI Match.
Output (Results) Summary:
- Based on exact matching the PPR EPID or Definitional ID or UPI for a given active or merged member record, pull the entity view for a given provider.
- Response includes the entity view for that provider when they have at least one active member record in the entity.
- Returns the complete set of provincial demographics and list of identifiers as the ‘golden record’ for a given provider.
Details:
Returns the following list of provider person attributes:
PPR EPID
Unique Provider Identifier (UPI)
License Number
Stakeholder Number
Official Name
Alias Name
Gender
Communication Language
Telephone
Fax
Email Address
Web Address
Profession Classification Type
Profession Code
Profession Start Date
License Effective Date
Profession Active Indicator
Specialty / Sub-Specialty Code
Specialty / Sub-Specialty Start Date
Address
LHIN Number
LHIN Name
Telephone
Fax
Affiliation Classification Code
Affiliation Type Code
Affiliation Unique Provider Identifier (UPI)
Training Type Code
Training Institution Name

Search Query

Retrieves a summary of provincial demographics and identifiers for a list of candidates that match the provided demographic data.

Table - Rules: Search Query

Summary Search by demographics to retrieve practitioner demographics based on the search criteria provided. Provides the Practitioner profiles for up to 25 matched candidates for the consumer to present to an end user.
Operation HL7 FHIR – Practitioner EMPI Match
Input (Criteria) Search by Demographics (minimum criteria):
1. Last Name + Municipality, or:
2. Profession + Municipality
Note: If a given adopter has a definitional identifier available for the provider person (e.g. License Number), they must perform a Practitioner Search to get the desired information in a single call.
Output (Results) Summary:
• Based on probabilistic matching of the input parameters for a given active or merged member record, pull the composite view for a given provider
• Response includes the composite view for that provider when they have at least one active member record in the entity.
• Returns the complete set of demographics and identifiers as the ‘golden record’ for a given provider.
Details:
Returns the following list of provider Location attributes:
PPR EPID
Unique Provider Identifier (UPI)
License Number
Stakeholder Number
Official Name
Alias Name
Gender
Communication Language
Telephone (s)
Fax (es)
Email Address (es)
Web Address
Profession Classification Type
Profession Code
Profession Start Date
License Effective Date
Profession Active Indicator
Specialty / Sub-Specialty Code
Specialty / Sub-Specialty Start Date
Address(es)
LHIN Number
LHIN Name
Telephone
Fax
Affiliation Classification Code
Affiliation Type Code
Affiliation Unique Provider Identifier (UPI)
Training Type Code
Training Institution Name

Location Queries

Overview

Consuming systems can query for provider Locations in the PPR via:

  1. Get Queries: PPR EPID or Definitional Identifier (e.g. Pharmacy Accreditation Number) to ‘get’ the current Location details and IDs
  2. Search Queries: Demographics (e.g. Location Name (legal or Known) + Municipality) to ‘search’ for candidate Locations, returning their current Location details and ID list for each matched candidate (i.e. to support selection of the appropriate candidate from a ‘pick list’).

The following sections detail PPR Organization/location get vs. search queries based on input criteria and output results. Note that output data in the PPR is dependent on the data density for a given organization as contributed across sources at the attribute level.

Get Query

Retrieve all Location details and identifiers for a given Location by definitional identifier.

Summary Get by Provider ID (EPID or Definitional ID) to retrieve all provincial demographics and IDs for a given provider.
Operation HL7 FHIR – Location Read (by EPID), or;
HL7 FHIR – Location Search (by Primary Identifier, UPI or EPID)
Input (Criteria) A primary ID Issuer & corresponding ID value must be provided in the request.
For Location Read:
1. PPR Enterprise Provider ID (EPID)
For Location Search:
1. PPR Enterprise Provider ID (EPID), or
2. Authoritative Source Identifier (e.g. OCP Source URI + Pharmacy Accreditation Number), or
3. PPR Unique Provider Identifier (eHealth Source URI + UPI)
Note: If a given adopter only has demographic information available for the Location they must perform an Location EMPI Match.
Output (Results) Summary:
• Based on exact matching the PPR EPID or Definitional ID or UPI for a given active or merged member record, pull the entity view for a given provider.
• Response includes the entity view for that provider when they have at least one active member record in the entity.
• Returns the complete set of demographics and list of identifiers as the ‘golden record’ for a given provider.
Details:
Returns the following list of provider Location attributes:
PPR EPID
Unique Provider Identifier (UPI)
Primary Identifiers (e.g. Stakeholder Number, Accreditation Number for Pharmacies)
Facility Number (FCN)
Master Number (MNI)
Location Legal Name
Location Common Name
Communication Language
Location Abbreviated Name
Location Type Code
Location Active Indicator
Location Operational Status Code
Location Operational Status Reason Code
Location Address
Location Telephone
Location Fax
Location Email Address
Location Web Address
Site ID
Address
LHIN Number
LHIN Name
Telephone
Fax
Email Address
Web Address
Affiliation Classification Code
Affiliation Type Code
Affiliation Unique Provider Identifier (UPI)

Search Query

Retrieves Location details and identifiers for a list of candidates that match the provided search criteria.

Table - Rules: Search Query

Summary Search to retrieve Location details and ID list for up to 100 matched candidates to present to a user to select a match.
Operation HL7 FHIR – Location EMPI Match
Input (Criteria) Search by Demographics (minimum criteria):
Location Name (legal or Known) + Municipality or
Role + Municipality or LHIN + Role
Note: If a given adopter has a definitional identifier available for the provider Location (e.g. UPI or OCP Accreditation Number), they must perform an Location Search to get the desired information in a single call.
Output (Results) Summary:
Based on probabilistic matching of the input parameters for a given active or merged member record, pull the Composite view for a given provider.
Response includes the entity view for that provider when they have at least one active member record in the entity.
Returns the complete set of Location details and identifiers as the ‘golden record’ for a given provider.
Details:
Returns the following list of provider Location attributes:
PPR EPID
Unique Provider Identifier (UPI)
Primary Identifiers (e.g. Stakeholder Number, Accreditation Number for Pharmacies)
Facility Number (FCN)
Master Number (MNI)
Location Legal Name
Location Common Name
Location Abbreviated Name
Location Type Code
Location Active Indicator
Location Operational Status Code
Location Operational Status Reason Code
Location Address
Location Telephone
Location Fax
Location Email Address
Location Web Address
Site ID
Address (es)
LHIN Number
LHIN Name
Telephone
Fax
Email Address
Web Address
Affiliation Classification Code
Affiliation Type Code
Affiliation Unique Provider Identifier (UPI)

Data Consumption

Agreements

PPR data consumption is bound to the following, as per direction from the eHealth Ontario Privacy)

The following attributes will be masked for consumption for specific roles due to personal information:

  1. Death Indicator and Death Date.

  2. Role Status

  3. License restrictions

  4. Mailing address

  5. Confidential addresses and contact information.

Definition of these attributes in the specification is with the understanding that the values will be masked in PPR query responses.

Enterprise Provider ID (EPID)

In addition to the business attributes scoped per the agreement language above, PPR EPID must not be displayed on user/clinician/provider facing user interface screens or reports.

PPR adopters MUST also consider EPID as a point-in-time value to avoid unexpected results due to dynamic PPR record linking. The EPID must only be used to resolve a provider as part of a ‘real-time’ integration (e.g. a single session or a get query immediately following selection of a search result). Historical (static) EPID values must not be used for future integration, given they may change.

EPID values should not be persisted outside of the PPR for any reason.

Minimum Search Criteria

Minimum search criteria are implemented to provide optimum performance for PPR query response times.

Maximum Results

Maximum results are enforced to provide optimum performance for PPR query response times. A maximum of 25 persons or 100 Locations can be returned in a given search.

Minimum Score

PPR is currently configured to return candidates matched on any positive score (i.e. greater than zero).

Merged Records

Performing a get query on a definitional identifier (e.g. get by License Number) for a record merged in the PPR will return the surviving provider. Note that the surviving provider details may differ from the query criteria used to match on the merged record.

Phone Number Convention

Given the PPR linking and matching algorithm does not score on special characters (e.g. dashes) or spaces, and only scores on the digits in the local number, PPR currently does not enforce a specific phone number convention or format at the product level. As a result, and depending on the source provider data, phone numbers in PPR are split across database elements, or combined within, per the illustrative examples below:

Table - Phone Number Examples

Provider Identification Segment - Phone Elements Example #1 Example #2 Example #3
Phone Number
Telephone Number
Telecom Use Code PRN WPN WPN
Telecomm Type Fax Phone Phone
Email Address
Country Code 1
Area/City Code 416
Local Number 5555555 (416)538-3937 14165383937 Ex: 4873
Extension 1234 Ext4873

PPR consumers must therefore implement logic to reconstruct and/or reformat the phone number across all the PPR returned phone elements. For example, a consumer must consolidate Country Code + Area Code + Local Number + Extension, or leverage the consolidated phone number field for FHIR consumption, and apply local rules accordingly. The logic to reassemble and/or format the phone number to assign in the local system will differ across adopters and their corresponding data standards. For example, for systems that only store digits without formatting, the rule would be to remove any non-numeric formatting characters from the concatenated phone string as only the phone digits will be persisted in the local system.

Address Convention

Provider Identification Segment - Address Elements Example #1 Example #2 Example #3
Street Address (aka Street Line1) 777 Bay Street 777 BAY ST. 777 Bay Street
Other Designation (aka Street Line2) 3rd Floor 3RD FLOOR 3rd Floor
City Toronto TORONTO Singapore
State or Province ON ON
Zip or Postal Code M5W 7E6 M5W7E6 270021
Country CAN CAN SGP
Address Type P P P

PPR consumers must therefore implement logic to reconstruct and/or reformat addresses across all the PPR returned address elements. The logic to reassemble and/or format addresses to assign in the local system will differ across adopters and their corresponding data standards. For example, a consumer requiring no space between Canadian postal code forward sortation area (e.g. M5W) and local delivery unit (e.g. 7E6) must strip out any spaces returned by PPR in the postal code element. Note that PPR currently only maps standardized state/province codes for Canada and the USA. International addresses may return null values for State/Province.

‘User-at-the-Keyboard’ Identification

To support privacy inquiries into the disclosure of provider PI, the End User’s user name or mnemonic ID MUST be included in the PPR query message (vial SAML) to identify the user who initiated the query. This is only required when a query is initiated by an actual user (vs. when performed by a system-to-system interaction (e.g. no disclosure to an individual user). Depending on the PPR interface implementation, this MUST be satisfied at the transport and/or functional message level(s).