Mapping working example: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
 
(26 intermediate revisions by the same user not shown)
Line 1: Line 1:
Manually mapping hundreds of fields and values can be extremely laborious and prone to error.
Manually mapping hundreds of fields and values can be extremely laborious and prone to error.


Mapping suggestion algorithms offer machine assisted hints to manually selected mappings.
A mapping API ensures that previously mapped resources can be repeated. Using the API achieves one of two things:


The algorithms work by starting with a context provided by the application, and via a series of iterations, narrow down the options to a small number. The algorithms are further tuned for specific patterns found in some source fields and values, and perhaps some authoring conventions when the target concepts were created.
# Generates mapping information from maps already authored
# Creation of a mapping task for future re-mapping when authored


In some cases, confidence levels are high enough to assume a single match (equivalent class axiom) and in this case it would be expected that a user validated a mapping once matched
The mapping algorithms work by starting with a context provided by the application, and via a series of iterations, produce a map if present. The algorithms are further tuned for specific patterns found in some source fields and values, and perhaps some authoring conventions when the target concepts were created.


=Table and field hints=
<br />
Take the following working example
 
<syntaxhighlight lang="JSON">
==Requesting a map via the IM mapping API==
{
Take the following working example. In this example the client has requested information model maps for some source data.  They wish to map directly to the DB schema, although in this case both the IM map and the DB response will be created
  "InformationModel": {
<syntaxhighlight lang="json">
    "Mapping": {
{"Mapping":[  {
       "Source": {
        "Request": "SourcetoDB",
         "Provider": "Barts",
        "OriginalSource": {
         "System": "CernerMillenium",
          "Provider": "Barts",
         "Context": {
          "System": "CernerMillenium",
           "id": 1,
          "Context": {
           "Table": "AdmittedPatientCare",
            "id": 1,
           "Field": "PatientClassificationCode",
            "Table": "AdmittedPatientCare",
           "Value": 1}}}}}
            "Field": "PatientClassificationCode",
            "Value": 1
          }
        },
        "TargetDBSchema": "Compass_version1" } ] }
</syntaxhighlight>The context provided consists of the fact that it is Barts hospital, Cerner Millenum system, and that a file has been provided that is loaded int a Table "admitted patient care (APC)" for the purposes of further analysis. This file is a CDS file documented on the NHS Data Dictionary web site.  A field with a value of 1 needs to be mapped. Until the mapping author has mapped the value, the field and value will remain as a generic extension.
 
There are a number of stages involved in the information model mapping server.
 
However, before considering the algorithms it is worthwhile diverging to look at maps that may have already been created
 
== Mapping Source to the information model ==
[[File:Hospital Encounters.png|thumb|Hospital encounter ontology]]
 
The mapping author knows that the likely entity in the data model  would be an 'Encounter', and on further examination a number of specialised encounter types appear available for selection. In particular a hospital inpatient stay. Given that the source table is "admitted patient care" this seems a good candidate.
 
[[File:Admission to Inpatient.jpg|thumb]]
 
From the NHS data dictionary specification it appears that the field "Patient classification code" is part of the admission of the patient. On further examination of the model it appears that the hospital in patient stay has a sub-component for a 'hospital admission' and 'discharge ' entries. 'Hospital Inpatient Admission  appears to be a better fit. [[File:Admission classification.jpg|thumb]]
 
Examining the admission encounter model further reveals a property of "admission classification of patient", which has a value range that includes 'ordinary admission' which has a comment suggesting that this matches to CDS code 1 for that field.
 
However, the admission encounter has a property of "is subcomponent of" another encounter, specifically the inpatient stay. Its beginning to look like 2 entities are involved in the model, one of which is a subcomponent of the other. There is therefore a dependency between a sub component and its container component.
 
In fact the mapping author did not need to know this. The IM mapping algorithm detects the 'sub component' dependency because the data model includes a property of ":RM_isComponentOf", which is a subproperty of the 'dependent relationship property'. it knows it is going to generate a dependant class 
 
This means that the target class is dependent on the presence of another class. This information is added by the IM mapping server also. 
 
The mapping author then selects the appropriate property and value. Consequently by simply selecting the relevant mapped field value the mapping is stored.  The API could generate the following response had the client been interested in the IM target 
 
<syntaxhighlight lang="json">
{ "IMTarget": {
        "Fromid": 1,
        "Class": ":CM_HospitalInpAdmitEncounter",
        "DependentRelationship": {
          "Relationship": ":RM_isComponentOf",
          "Class": ":DM_HospitalInpEntry"
        },
        "PropertyValue": {
          "Property": ":DM_admissionPatientClassification",
          "Value": "CM_AdmClassOrdinary"}}}
}
</syntaxhighlight>At an even earlier stage to this, the IM author had already generated an IM to DB map for encounters. In addition when the target resource was originally authored, the extension table was authored.<syntaxhighlight lang="json">
{ "Mapping": [
       {
        "IMSource": {
          "Class": ":DM_EncounterEntry"
        },
         "TargetDBSchema": "Compass_V1",
         "DBTarget": {
          "Table": "encounter",
          "SubTypeField": "type",
          "ExtensionTable": "encounter_extension"}}}
</syntaxhighlight>
In other words, the IM data model class :DM_Encounter entry maps to the encounter table for compass version 1 and the encounter table has an extension table (which is dependent on the encounter table for its existence).
 
At this point, the IM server knows a fair bit about this source.
 
Nevertheless, the inbound transformer wants to go one step further and populate the actual implementation schema. Hence they used the "schema request field".
 
== Mapping from the information model to the target schema ==
The mapping author has already created the IM schema map as above and has authored the target DB schema resources.  In this case encounter subclasses all map to the table encounter and and the encounter table has an extension table - 'encounter extension',  and using the [[entity subtype attribute]] method assigns the  'admission encounter' type as a value of the encounter "type" field. 
 
The admission encounter type had previously been mapped also to encounter type but is there is a dependency in the information model between the admission and the inpatient encounter, the dependency field had also been mapped.
 
Thus additional information can now be added to the mapping response.
 
<syntaxhighlight lang="json">
{ "DBTarget": {
         "Fromid": 1,
        "DependentRelationship": {
           "Relationship": "encounter_id",           the entry is dependent on the existence of the encounter table (i.e. the admission encounter)
           "Table": {
            "DependentRelationship": {
              "Relationship": "parent_encounter",    //the encounter table is also dependent on its parent encounter (i.e. the inpatient stay encounter)
              "Table": {
                "Table": "encounter",
                "DependentField": {                    //Not only should the inpatient stay encounter exist it must also be assigned as type "inpatient entry"
                  "Field": "type",
                  "Value": "DM_HospitalInpEntry"}}},
            "Table": "encounter",
            "DependentField": {
              "Field": "type",
              "Value": "CM_HospitalInpAdmitEncounter" } } },  // The encounter must be of type 'Hospital inpatient admission'
        "Table": "Encoounter_extension",                     // This is the actual table being populated
        "FieldValue": [
           {
            "Field": "property",
            "Value": "DM_admissionPatientClassification"     //This is the code to be placed in the property field of the extension table
          },
           {
            "Field": "value",
            "Value": "CM_AdmClassOrdinary" }]}}}}           //This is the code to be placed in the value field of the extension table
</syntaxhighlight>
</syntaxhighlight>

Latest revision as of 13:53, 27 May 2020

Manually mapping hundreds of fields and values can be extremely laborious and prone to error.

A mapping API ensures that previously mapped resources can be repeated. Using the API achieves one of two things:

  1. Generates mapping information from maps already authored
  2. Creation of a mapping task for future re-mapping when authored

The mapping algorithms work by starting with a context provided by the application, and via a series of iterations, produce a map if present. The algorithms are further tuned for specific patterns found in some source fields and values, and perhaps some authoring conventions when the target concepts were created.


Requesting a map via the IM mapping API

Take the following working example. In this example the client has requested information model maps for some source data. They wish to map directly to the DB schema, although in this case both the IM map and the DB response will be created

 {"Mapping":[   {
        "Request": "SourcetoDB",
        "OriginalSource": {
          "Provider": "Barts",
          "System": "CernerMillenium",
          "Context": {
            "id": 1,
            "Table": "AdmittedPatientCare",
            "Field": "PatientClassificationCode",
            "Value": 1
          }
        },
        "TargetDBSchema": "Compass_version1" } ] }

The context provided consists of the fact that it is Barts hospital, Cerner Millenum system, and that a file has been provided that is loaded int a Table "admitted patient care (APC)" for the purposes of further analysis. This file is a CDS file documented on the NHS Data Dictionary web site. A field with a value of 1 needs to be mapped. Until the mapping author has mapped the value, the field and value will remain as a generic extension.

There are a number of stages involved in the information model mapping server.

However, before considering the algorithms it is worthwhile diverging to look at maps that may have already been created

Mapping Source to the information model

Hospital encounter ontology

The mapping author knows that the likely entity in the data model would be an 'Encounter', and on further examination a number of specialised encounter types appear available for selection. In particular a hospital inpatient stay. Given that the source table is "admitted patient care" this seems a good candidate.

Admission to Inpatient.jpg

From the NHS data dictionary specification it appears that the field "Patient classification code" is part of the admission of the patient. On further examination of the model it appears that the hospital in patient stay has a sub-component for a 'hospital admission' and 'discharge ' entries. 'Hospital Inpatient Admission appears to be a better fit.

Admission classification.jpg

Examining the admission encounter model further reveals a property of "admission classification of patient", which has a value range that includes 'ordinary admission' which has a comment suggesting that this matches to CDS code 1 for that field.

However, the admission encounter has a property of "is subcomponent of" another encounter, specifically the inpatient stay. Its beginning to look like 2 entities are involved in the model, one of which is a subcomponent of the other. There is therefore a dependency between a sub component and its container component.

In fact the mapping author did not need to know this. The IM mapping algorithm detects the 'sub component' dependency because the data model includes a property of ":RM_isComponentOf", which is a subproperty of the 'dependent relationship property'. it knows it is going to generate a dependant class

This means that the target class is dependent on the presence of another class. This information is added by the IM mapping server also.

The mapping author then selects the appropriate property and value. Consequently by simply selecting the relevant mapped field value the mapping is stored. The API could generate the following response had the client been interested in the IM target

{ "IMTarget": {
        "Fromid": 1,
        "Class": ":CM_HospitalInpAdmitEncounter",
        "DependentRelationship": {
          "Relationship": ":RM_isComponentOf",
          "Class": ":DM_HospitalInpEntry"
        },
        "PropertyValue": {
          "Property": ":DM_admissionPatientClassification",
          "Value": "CM_AdmClassOrdinary"}}}
}

At an even earlier stage to this, the IM author had already generated an IM to DB map for encounters. In addition when the target resource was originally authored, the extension table was authored.

{ "Mapping": [
      {
        "IMSource": {
          "Class": ":DM_EncounterEntry"
        },
        "TargetDBSchema": "Compass_V1",
        "DBTarget": {
          "Table": "encounter",
          "SubTypeField": "type",
          "ExtensionTable": "encounter_extension"}}}

In other words, the IM data model class :DM_Encounter entry maps to the encounter table for compass version 1 and the encounter table has an extension table (which is dependent on the encounter table for its existence).

At this point, the IM server knows a fair bit about this source.

Nevertheless, the inbound transformer wants to go one step further and populate the actual implementation schema. Hence they used the "schema request field".

Mapping from the information model to the target schema

The mapping author has already created the IM schema map as above and has authored the target DB schema resources. In this case encounter subclasses all map to the table encounter and and the encounter table has an extension table - 'encounter extension', and using the entity subtype attribute method assigns the 'admission encounter' type as a value of the encounter "type" field.

The admission encounter type had previously been mapped also to encounter type but is there is a dependency in the information model between the admission and the inpatient encounter, the dependency field had also been mapped.

Thus additional information can now be added to the mapping response.

{ "DBTarget": {
        "Fromid": 1,
        "DependentRelationship": {
          "Relationship": "encounter_id",            the entry is dependent on the existence of the encounter table (i.e. the admission encounter)
          "Table": {
            "DependentRelationship": {
              "Relationship": "parent_encounter",     //the encounter table is also dependent on its parent encounter (i.e. the inpatient stay encounter)
              "Table": {
                "Table": "encounter",
                "DependentField": {                    //Not only should the inpatient stay encounter exist it must also be assigned as type "inpatient entry"
                  "Field": "type",
                  "Value": "DM_HospitalInpEntry"}}},
            "Table": "encounter",
            "DependentField": {
              "Field": "type",
              "Value": "CM_HospitalInpAdmitEncounter" } } },  // The encounter must be of type 'Hospital inpatient admission'
        "Table": "Encoounter_extension",                      // This is the actual table being populated 
        "FieldValue": [
          {
            "Field": "property",
            "Value": "DM_admissionPatientClassification"      //This is the code to be placed in the property field of the extension table
          },
          {
            "Field": "value",
            "Value": "CM_AdmClassOrdinary" }]}}}}           //This is the code to be placed in the value field of the extension table