Data map API: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
 
(55 intermediate revisions by the same user not shown)
Line 1: Line 1:
<span style="color:red"> This page is under review </span>[[File:Information model packages - mappings.png|thumb]]The mappings package is one of the 5 main component categories in the information model.
[[File:Information model packages - mappings.png|thumb]]The mappings package is one of the 5 main component categories in the information model.


The data map API is an API provided by the information model service to aid with the transfer of data between two data sets, with the source data set obtained from a publisher and the target data set reflecting the common data model.   
The data map APIs are a set of APIs provided by the information model server to aid with the transfer of data between two data sets, where the source and target data sets conform to a data model. Either the source or the destination data models (or both)  may be implemented as either relational or graph or both.   


__TOC__
The APIs cover two main types of activities:


For the information model to be even more useful than simply a reference, it is helpful to be able to map published data to the common model, and map the common model content to implemented databases. This provides a mechanism of resolving many to many relationships between a source and a destination whilst at the same time 'cleaning' the data on the way to provide a standard.  
# The retrieval of mapping files for use by a transform module for the full transform
# The mapping of properties and values to aid with existing transforms.


In order to support this, there are two main data resources required.  
The extent to which these are used by a transform module is determined by the degree of knowledge already built into a transform, and the nature of the transformation software. It may simply be used as a human readable documentation. It may be used to enhance a current transform. It may also be used to drive a generic transformation engine.


# A map that takes some source published data and outputs the data in the common model format.
== Background and use cases ==
# A map that takes some common model data and outputs the data to a database that holds an actual implementation of part of, or the whole of, the model.


For this to operate, a mapping server API is required so that a client wishing to transform some data from A to B, is able to obtain the information in a computable manner.  
The Discovery information model supports a consistent approach to the presentation of maps using a mapping syntax heavily influenced by the [https://www.w3.org/TR/r2rml/ W3C  R23ML]  language and the ORM JPA approach as used by open source systems such as [http://hibernate.org/orm/ Hibernate.]


== Fundamentals ==
The use of a mapping server enables the separation of concern between the maintenance of data maps themselves and the applications that transform data. As well as the technical separation it separates the business of map production from the business of transformation.  
[[File:Mapping pipeline.jpg|alt=|thumb|Mapping pipeline ]]
At its most basic level the objective of a mapping process is to take some source ''value'' and produce a target ''structure'', which provides information about the class, property, and target value that the source value maps to.


For the source value to be understood, it is necessary to provide some ''context'' to it. All values are therefore set in some form of context. As a minimum this context would be the source property i.e. the property for which the value is a value of. In many cases though,  the context will be much more extensive.
The mapping server can be used by a transform module operating as a client via the mapping APIs.  


For example, in a source system, the word 'negative, a value of the field 'result text' associated with a code value '12345', a value in the field 'test', which is a field of the table of 'clinical events', used by the system 'Cerner Millennium', in the hospital 'Barts NHS trust', may mean something completely different to 'negative' with the code '12345' set in another hospital, even with the same system.
Unlike many transform tools, the mapping server does not provide run time mapping code. The server assumes that the transform client is technically able to map X to Y, but may not know what the map actually is. Use of the mapping server enables the transform client module to operate using a generic mapping process as long as the module can support the technology that X and Y data exists in.  
A further layer of context includes the ''Domain'' in which the mapping takes place. A map generated for one purpose may be different when generated for another purpose.  


For example, when processing published data into Discovery, the domain in question could be described as  the 'inbound publisher mapping domain'.
The main challenge in transforming data, is not so much the software, as the knowledge of what maps to what and how. This knowledge usually resides in the domain of informatics. The conventional approach of writing down this knowledge in documents or spreadsheets is helpful but insufficient. Instead it would be much better if this knowledge was presented in both a machine readable and human readable form. Even better, if the mappings can be used to parameterise a generic mapping application.


Consequently, the implicit idea of a context and a domain, is explicitly modelled as a 'Domain mapping node' object. The class,  and the various supporting classes are described in the following sections.  
There are many different mapping techniques in operation. In particular object relational mapping tools based on this separation already exist and there are a scattering of standards in this area. Thus, The approach in the Discovery mapping service draws on the ideas inherent in the [https://www.w3.org/TR/r2rml/ W3C R2RML] and ORM open source systems such as [http://hibernate.org/orm/ Hibernate.]


== Mapping Convergence ==
The APIs are designed to be used as an assistance to a code based ORM like transform, and can thus be used for some or all of the process, depending on the requirements of the transform client. In other words, the use of the APIs is a pick and mix approach, making it possible to introduce it into currently operating transformations without disruption. 


When processing data along the pipeline, mapping can be seen as a set of interconnected nodes, each node being triggered by the passing in of one or source objects, or the outputs from other nodes,  finally resulting in the output of a target object for use by the calling application.
The mapping API provides 3 levels of mapping 


Mapping convergence is the means by which there is an attempt to rationalise the huge number of source types to a fewer number in order to make map authoring simpler and more efficient.
# Full schema mapping.  The mapping server operates like a SatNav by providing both a map and a set of instructions as to how to use it. It nevertheless keeps the control of the process within the code space of the transform application. It uses similar constructs to the data set definition language i.e. a query syntax, albeit simplified. 
# Property mapping. The server maps a class/ property or table/ field to a property concept. This is useful for things like FHIR when working out the nature of an extension or key value pair 
# Value mapping. The means by which a legacy code or piece of text, is mapped to a core ontology concept, the mapping dependent on the context of the legacy code or term.  


For example, let us say that we are trying to map a drug code from an EMIS drug issue to the common model. We understand that the code comes from Chrisp Street Health Centre, which uses EMIS Web, and the source table in question is the Prescribing issue table,  with field of Codeid, and the code value is '12345'.
The above correspond to the 3 main APIs: Get map, Get mapped property, and get Mapped value.
[[File:Mapping sequences.png|thumb|Use of the two mapping APIs]]
There are 3 main uses of the get map API:


Likewise, we are also trying to map a drug code from a TPP practice source. We understand that the code comes from the Parkdale medical centre, that they use TPP SystmOne, that the source table in question is the PrimaryCareMedication table , the field is code and the id is 232000001108.
* Relational to object mapping (MR2O) . Obtain a map which describes how to create or reference target objects from a set of source tables and fields, and how to map their values.
*Object to relational mapping (MO2R). Obtain a map which describes how to create or reference a target database from a set of source objects and how to map their values.
*Object to Object mapping (MO2O). Obtain a map which describes how to create or reference a target objects from a set of source objects and how to map their values.


We also know that EMIS provides a look up table between their code and DM&D, but SystmOne provides the DM&D code itself.
The different approaches to the above maps is designed to address the problem of [[wikipedia:Object-relational_impedance_mismatch|object relational impedance mismatch]]  which is the problem of moving to and from rows/columns to types/subtypes;/properties.


The first thing we recognise is that the context of the above two sources appears to be nearly equivalent. Both are going to end up in the same target class and both will end up with the same target value when mapped from DM&D. There is a variation in EMIS in that before getting to exactly the same context, there is a prior step to perform, the map between EMIS 'X' and the DM+D 'Y'. However, if that mapping were to occur first, then the two context's would be exactly equivalent.
The main use of the Get mapped property API is for assigning the common data model properties when processing legacy data from non standard sources.


It appears that there is some form of convergence from two sources. This can be illustrated in the following way:
There is one main use of the Get mapped value API which is for a codeable value in the source (e.g. a code or text), when taken together with the context of that value, obtain the common information model target concept (or code).
[[File:Convergent context.jpg|center|thumb|800x800px]]
 
Thus the first thing to do may be to try and converge to a common node, this node being a map node 


== Mapping node chain ==
== General approach to mapping ==
When the mapping API is called, the mapping service looks for a mapping node to trigger, the node being the one that accepts the source properties sent in by the client application as inputs. As each node is triggered, maps occur until such time as there are no further nodes to trigger and the output has been reached.  
[[File:2 step map.jpg|thumb|Choice of mapping steps]]Discovery maps from a source only to the common information model's data model, and from the common data model to a target i.e. two mapping steps, often combined into one from the client perspective.


When authoring a map, the map author chains together a series of actions and calls the relevant nodes with the relevant source data properties (or simply passes on the results of a previously triggered node). Thus a chain is built up.
The benefit of this approach is that only one outbound map is required for each type of target (map from the common model) and only one inbound map for each source (map to the common model). The common data model acts as the hub that can "clean" and organise the data in a standardised way.


Because of the convergence concepts, it is likely that the same node will be used in many different mapping chains, and it is equally likely that the same chains will be used by different source systems. It is possible to configure maps that ignore particular source identifiers. For example to process an EMIS observation entry, the provider id would be ignored as the only relevant context is the type of system. The provider property would be set to ALL in the mapping configuration.
This approach also enables mapping clients to make use of either or both mapping steps separately or grouped depending on the use case. For example, currently implemented mapping process that map to and from FHIR, may only require a few additional IM facilities, such as what to do with new provider fields and what to put in a generic FHIR name/ value pair extension.


=== Mapping node class ===
The approach to the Discovery syntax is in line with the information modelling language i.e uses class definitions to define the language syntax which provides explicit labelling of the constructs. In line with the modelling language a pure token based approach is also supported.
[[File:MapClass.jpg|thumb]]
'Domain mapping node' sums up the idea that a number of different source structures can in the end converge and map to a single target structure, by dint of the common domain in which the map occurs, and the common context in which a particular set of sources share.


The primary object is the 'Domain mapping node' which has a uniquely identified context to which a number of sources map to. Each different source is referred to as a 'Input slots' and all slots must be filled for the map node to be triggered.
== Use of Context ==
Context is used throughout the mapping model. Context indicates the context in which a particular data map artefact operates. The same map set in a slightly different context may operate in a different way. For example a mapping of X from system A from provider B, may be different from the mapping of X from system A from provider C. Context differentiates the two.


A mapping node is authored over time, perhaps initially by selecting a single source, but then recognising that additional sources are able to converge.
Context is used by the server in two ways:


In the above example, the author has decided that there us similarities between the EMIS and TPP prescription tables. A node can be created with a set of properties that indicate a convergence sufficiently to know that the output is class and property. For example:
* As a means of identifying slight differences between seemingly similar maps i.e. a different context, thus avoiding the wrong map.
<div class="toccolours mw-collapsible mw-collapsed">
An example mapping node configuration that deals with some inbound medication
<div class="mw-collapsible-content">
<syntaxhighlight lang="json">
{ "MapNode": [
      {"IRI": ":MN_GPMedRequests",
        "Domain": ":InboundPublisherMaps",
        "Description": "Common convergence for source GP medication issues",
        "MapSlot": [
          {
            "Property": "Source_Provider"},          //No Value therefore any provider will do
          {
            "Property": "System",
            "Value": [ "EMISWeb","SystmOne"]},
          {
            "Property": "Table",
            "Value": ["PrescriptionIssue", "SRPrimaryCareMedication"]}],
        "AddOutput": {
          "Class": ":DM_MedicationRequest",
          "Property": ":DM_requestedMedication" } } ],
</syntaxhighlight></div></div>


We see here that if any provider, using EMIS Web or SystmOne as their source, if the source table is either Prescription issue, or SRPrimaryCareMedication then we are going to end up eventually with the same context and a class of medication request and a property of requested medication.
* As a means of converging slightly different maps to a common mapping context i.e. initially different contexts converging to the same context to enable re-usability of maps.


We have yet to deal with the remaining EMIS value problem. To do that we create another node to deal with the EMIS variation, which only requires the Codeid and the value as input slots.
Context is used by the client in order to short cut requests via the API. Having first defined context, it is then presented by the mapping server as a token for use by the the client, to enable re-use of the same context as a short cut to improved performance and to make coding easier.  The context token is then an exchange token that tells the mapping server that a particular map request is operating within a particular context. Alternatively the client could send enough information each time to the server to provide the context for a map, so in this sense the context token is a convenience rather than a necessity.


This node calls a mapping function with the three parameters one of which is the property value of the input property
There are two ways for the client to handle context:
<div class="toccolours mw-collapsible mw-collapsed">
A simple mapping node with a property code look up
<div class="mw-collapsible-content">


# Provide the context with the API request. The mapping server reads the context and works out which mapping node to use to return the mapped property or value
# Provide the context identifier with the API request. The client already knows the context of a request and therefore can send it back in. Context operates as a sort of toke


<syntaxhighlight lang="json">
Client provided context may use some or all of the full context information depending on whether the client knows the mapping server's mapping rules. For example, every GP practice using an EMIS Web system has the same context for all observations (as every EMIS GP system uses the same table structure. Thus the client does not need to send in the provider ID. However, if in doubt this can be sent in, as the mapping server will know to ignore it.
  {    "IRI": ":MN_EMISCodeIdLookUp",
        "Domain": ":InboundPublisherMaps",
        "Description": "Look up for EMIS code id",
        "Input": {
          "Property": "CodeId"},
        "MapFunction": {
          "Name": "CodeMap",
          "Parameter": [{
              "Fixed": "EMISCodeId"},
            {
              "PropertyValue": "CodeId"},
            {
              "Fixed": "DMD"}] },
        "AddOutput": {
          "Property": "Drug",
          "FunctionResult": "true"} } ]}
</syntaxhighlight></div></div>


We are now ready to converge the two outputs into the final node, which only has to do the final map.
In this example the request shows the full context definition of a field from an "Admitted patient care" table which requires a value map for the '1' in the field "administration_category_code"<syntaxhighlight lang="json">
<div class="toccolours mw-collapsible mw-collapsed">
{
The final mapping node in the chain performing a concept look up
"MapColumnValueRequest": {
<div class="mw-collapsible-content">
  "Provider": {
 
  "CodeScheme": "\"https://fhir.nhs.uk/ValueSet/ods\"",
<syntaxhighlight lang="json">
  "Value": "H123344",
{ "MapNode": [
  "Display": "BartsNHSTrust"
      {
  },
        "IRI": ":MN_IMMedicationrRequestEMISTPPDrugMap",
  "System": {
        "Domain": ":InboundPublisherMaps",
  "CodeScheme": "https://DiscoveryDataService.org/InformationModel",
        "Description": "Convergence nodes for medication requests from EMIS and TPP",
  "Value": "CernerMillenium"
        "Input" :[
  },
        {"MapNode" : ":MN_GPMedRequests"},
  "Schema": "CDS",
        {"Property": "Drug"}],
  "Table": "APC",
        "MapFunction" : {
  "Column": "administrative-category_code",
              "Name":"ConceptMapper"},
  "Value": {
        "AddOutput": {
  "CodeScheme": ":CM_NHS-DataDictionary",
          "Add":{"FunctionValue": true } } ]}  
  "Value": 1
</syntaxhighlight></div></div>
  }
 
}
Cumulatively, the result of the map contains the target class, property and value as well as the context node identifiers that generated it.
}
 
</syntaxhighlight>If the client knows the context the request could be as follows:<syntaxhighlight lang="json">
The final target output to the client being the map result set
<syntaxhighlight lang="json">
{
{
  "MapResultSet": {
  "MapColumnValueRequest": {
   "MappingShortcut": {
   "Context":"Barts/Cerner/CDS/APC/administration_classification_code",
  "ShortCutNode": ":GPMedRequests",
  "Value": {
  "RequiredProperty": ["CodeId"]},
  "CodeScheme": ":CM_NHS-DataDictionary",
  "Class": ":DM_MedicationRequest",
  "Value": 1
  "Property": ":DM_RequestedMedication",
   }
   "Value": ":SN_123300212120"
  }
  }
}
}
</syntaxhighlight>
</syntaxhighlight>


The underlying approach to authoring the maps themselves is described in the article [[Map maker manager]] which describes the fundamental concepts involved in making the maps for the server to deliver.


Of significance is the 'Mappint short cut' produced as part of the result. This is a performance enhancing value that can be used by the client in the next API call to speed up the response. If the client is confident that the class and property are going to be the same (as a result of being the same source table and field then only the property name and value need be submitted.
== Map requests ==
One of the two main map APIs involves requesting a map for the purposes of mapping from a source schema to a target schema.


Furthermore, it can be seen from the logical 2 step mappings that it is equally practical for clients to consider a direct map from source to destination knowing that it has mapped to the common model as part of the process. This contrasts this style of mapping to conventional integration mappings that map from many to many directly. In other words by mapping in two stages we get a series of one to one maps which appear to be one to many
The request - response exchange involves the use of JSON or XML conforming to the mapping syntaxe which is a subset of the the information model language. JSON or token based syntax is supported.


To more easily demonstrate how mappings work, there is a [[Mapping hint algorithms|working example]] showing a walk through of the use of the mapping API using the resource examples illustrated below
A request instance, like all the information model content, conforms to a subset of a single overarching modelling language. Thus a mapping document is both human and machine readable. I


==Target DB schema resources==
The mapping server contains a repository of Maps, generated by the map server from a set of map nodes previously authored by a map maker.
[[File:DB Schema class.jpg|thumb|DB Schema class]]
Before doing any mappings, it is necessary to model a target schema in order to map to it.


Implementation schema resources are a set of objects of the class DBSchema (to the right)
=== Map Request ===
[[File:Map domain.jpg|thumb|Map domain class]]
A map request is a request for  map between a source schema and target schema. The combination of source description, mapping process and target definition for a full schema can be referred to as a map domain.


The class is designed as a simple entity relationship class with 2 additional properties:
To request a map, a requester must first know the nature of the source and target they are requesting the map for. The description of the source and the target are both provided in structure that provides sufficient context for the mapping server to identify the source and target map required i.e. contains a set of identifiers that provide sufficient context for the map server to know what map to retrieve.


#The name of the table's extension tables. These are optional triple tables designed so that a schema can continue to extend to additional properties and values using the information model to determine the properties and data types. This avoids the need to continually change the relational schema with new data items.
A map response contains a Domain map (class Map domain) which includes the source and target context for the map. The source and target terminals consist of identifiers for the source and target context properties.
#The name of the field holding the subtype indicator. This is described as the [[entity subtype attribute]].


The following is an example of a snippet from an encounter table:
A mapping request contains both source context and target context. If no target context is included, it is assumed that the target is the core information model.
<div class="toccolours mw-collapsible mw-collapsed">
A schema table example showing extension table and subtype field
<div class="mw-collapsible-content">


<syntaxhighlight lang="JSON">
The following is part of the body of a request for a mapping file for Barts Trust CDS schema to the information model<syntaxhighlight lang="json">
{"DBSchema": {
{
      "DBSchemaName": "Compass_version_1",
"MapRequest": {
      "DBTable": {
  "Provider": {
        "DBTableName": "encounter",
  "CodeScheme": "\"https://fhir.nhs.uk/ValueSet/ods\"",
        "DBExtensionTable": {
  "Value": "H123344",
          "DBTableName": "encounter_extension"
  "Display": "BartsNHSTrust"
        },
  },
        "DBSubTypeField": "type" } } }
  "System": {
  "CodeScheme": "https://DiscoveryDataService.org/InformationModel",
  "Value": "CernerMillenium"
  },
  "Schema": "CDS",
  "Target" :"https://DiscoveryDataService.org/InformationModel/Core"
}
}
</syntaxhighlight>
</syntaxhighlight>
</div></div>


The encounter table is expecting subtypes to be authored and therefore has a "subtype" field authored in the table in order to avoid generating many subtype tables.
If the client knows that the map is provider independent and  they do not need to pass in the provider, then they need not populate the source provider property. If they do not know whether it is relevant or not, and they populate the property with the provider identifier, then the mapping server may choose to ignore that property if not relevant, and the response will not include the provider.
==Original source Resources==
Every map has a source and target. From the perspective of the information model an 'original source' represents a data model created from a publisher's source data i.e. is likely to be a relational or json representation of publisher data that might have been originally delivered as HL7 V2, XML, JSON, CSV or pipe delimited flat files. Source resources are therefore not representations of the actual data, but representations of a model that would be used when transforming to the common model. An example of this is a staging table.
[[File:Original Source resource.jpg|thumb|Source resource description|alt=|295x295px]]
It is assumed that a source may contain many tables, each with many fields, each with many values including text. It is not necessary for there to be actual tables, and fields and any object structure, masquerading as such, can be used. The terms 'table' and 'field' are used for convenience and refer to objects and properties just as well.


There many be differences between one provider and another using the same system, and different versions of the system. Thus there is a need to provide context for each source resource.
The mapping document response also contains the actual source and target map domain object as well as the relevant maps, together with the contexts of the source and target.


Each element of source data must explicitly inherit the context so that the mapping API can recognise the context with each request.
In the example the domain map context is between Barts CDS and FHIR Care connect


The Original source resource object reflects a single logical thing to map. In most cases this will be a single field and single value. However, in some cases (such as free text sources), the source is derived from a list of fields, each with certain values.
<syntaxhighlight lang="json">
{
  "MapRequest": {
  "Context" :"Barts/Cerner/CDS"
  "Target" :"https://DiscoveryDataService.org/InformationModel/Core"
}
}
</syntaxhighlight>


If the client knows the map context it simply needs to pass in the source and target context as part of the request.


=== Map table column request ===
A mapping document contains one or more data maps between a source and a target. The source may be relational or in object form , and the target may be relational or object form, depending on the map domain parameters as described above.   


For example, a piece of text saying "negative", when contextualised as a result against a test for 'Hepatitis B surface antigen' would use compound context consisting of the table, the test field, the test code, the result field and the result text of negative.<div class="toccolours mw-collapsible mw-collapsed">
The only difference in syntax between Object to object mapping and relational to object/ relational mapping is the terminology used. Tables columns, joins foreign keys, versus objects and properties and relationships.  
An example original source object from a CDS admitted patient care record with a value of '1' for the admitted patient classification
<div class="mw-collapsible-content">


<syntaxhighlight lang="JSON">
In this example the client is requesting the map for a particular column from an admitted patient care column. The client already knows the context identifier for the schema and the table.  <syntaxhighlight lang="json">
{ "OriginalSource": {
          "Provider": "Barts",
          "System": "CernerMillenium",
          "Context": {
            "id": 1,
            "Table": "APC",
            "Field": "PATIENT_CLASSIFICATION_CODE",
            "Value": 1 } }
</syntaxhighlight>
</div></div>


The original source resource would be used as part of a mapping request submitted via the API
"MapColumnRequest": {
==Information model target resource==
"Context": "Barts/Cerner/CDS/APC",
The common information model target resource is the target of a map from an original source.
  "Column": "administrative-category_code"
}
</syntaxhighlight> 


The resource is delivered as part of the IM mapping API response.
=== Map column value request ===
[[File:IM Target resource.jpg|thumb|Target resource for a map]]
This API returns a target  value from a source value set in the context of a source provider source system, schema, table/class, field or property.


The target resource indicates the class and properties of the target object, the object may need creating or adding to with the target property, and the target value.
Typically this might be used by a transform client when ascertaining what to do with unknown source values that have not part of the mapping domain. Typically this is used when new codes are required or where the value is from a legacy classification or text.


In some cases multiple sources may link to one target and in other cases multiple targets may link to one source, the link item being the "from id" link fro the target.
In this example the client is seeking the mapped value for a field value of 1. The term is unknown (i.e. is not in a look up table from the source)<syntaxhighlight lang="json">
{
"MapColumnValueRequest": {
  "Provider": {
  "CodeScheme": "\"https://fhir.nhs.uk/ValueSet/ods\"",
  "Value": "H123344",
  "Display": "BartsNHSTrust"
  },
  "System": {
  "CodeScheme": "https://DiscoveryDataService.org/InformationModel",
  "Value": "CernerMillenium"
  },
  "Schema": "CDS",
  "Table": "APC",
  "Column": "administrative-category_code",
  "Value": {
  "CodeScheme": ":CM_NHS-DataDictionary",
  "Value": 1
  }
}
}
</syntaxhighlight>Sometimes, when mapping text or a value,  it is necessary to provide additional context from another field.


For example, in a source system, the word 'negative, a value of the field 'result text' associated with a code value '12345', a value in the field 'test', which is a field of the table of 'clinical events', used by the system 'Cerner Millennium', in the hospital 'Barts NHS trust', may mean something completely different to 'negative' with the code '12345' set in another hospital, even with the same system.


<div class="toccolours mw-collapsible mw-collapsed">
In this case a dependent field value is sent  with the value nested within the dependent field. In this example the mapping client is seeking the concept to represent a "positive" result for a test code.<syntaxhighlight lang="json">
An example target resource from a request containing an original source object
{
<div class="mw-collapsible-content">
"MapColumnValueRequest": {
  "Context": "/CDE/CLEVE/ResultCode",
  "DependentColumnValue": {
  "Column": "Event_Code",
  "Value": {
    "Value": 3562720
  }
  },
  "Value": {
  "Value": "Positive"
  }
}
}
</syntaxhighlight>


<syntaxhighlight lang="JSON">
== Map Response ==
  {"IMTarget": {
A data map uses a set of clauses  in the order of '''CREATE FROM''' , '''WHERE'''  which indicate that an object or property should be created from a source table or field, with perhaps with some other criteria, the transform will either create an object, property or relationship or reference an object already created previously in another process. 
          "Fromid": 1,
          "Class": ":CM_HospitalInpAdmitEncounter",
          "DependentRelationship": {
            "Relationship": ":RM_isComponentOf",
            "Class": ":DM_HospitalInpEntry"
          },
          "PropertyValue": {
            "Property": ":DM_admissionPatientClassification",
            "Value": ":CM_AdmClassOrdinary"  } }}
</syntaxhighlight></div></div>


The class, properties and values of the IM resource all reference IM concepts.
Data maps may be nested so that maps that have created objects as dependent (child objects) can go on to populate the properties of the child objects. Also CREATE clauses can have nested data maps in order to go on to populate objects created as targets of relationships.  


This resource says that the target IM class is a "hospital inpatient admission encounter type". This object is dependent on the presence of a container encounter, in this case of type "hospital in patient stay" and the relationship between them is 'is subcomponent of' . The field value source results in the property of 'admission patient classification; and the value being 'Ordinary admission'.
Many maps are simple one to one maps. However, the following example shows a part of a map that takes a flat CDS table row and creates two encounters, one of which is a sub-encounter to the other.  <syntaxhighlight lang="json">
"map": {
    "from": {
      "table": "cds_inpatient"
    },
    "create": {
      "context": "String",
      "object": {
      "class": " :DM_HospitalInpEntry",
      "idGenerator": {
        "auto": "true"
      }
      },
      "as": "@inpatient"
    },
    "map": [
      {
      "from": {
        "field": "spell_number"
      },
      "create": [
        {
        "object": {
          "class": ":CM_HospitalInpAdmitEncounter>",
          "idGenerator": {
          "auto": "true"
          }
        },
        "as": "@admission"
        },
        {
        "relationship": {
          "relationshipType": ":RM_isComponentOf",
          "referenceTarget": "@inpatient"
        }
        }
      ]
      },
      {
      "from": {
        "field": "administrative_category_code"
      },
      "create": {
        "context": "/CDS/APC/administrative-category_code",
        "property": {
        "name": "DM_adminCategoryonAdmission",
        "value": {
          "valueMap": [
          {
            "sourceValue": "01",
            "targetValue": ":CM_AdminCat01"
          },
          {
            "sourceValue": "02",
            "targetValue": ":CM_AdminCat02"
          },
          {
            "sourceValue": "03",
            "targetValue": ":CM_AdminCat03"
          },
          {
            "sourceValue": "04",
            "targetValue": ":CM_AdminCat04"}]}}}}]}}}
</syntaxhighlight>To walk through the above example, a set of instructions:


Note that the information model resource uses subtypes as classes, in line with the ontology. It avoids the complexity involved in populating database schemas. However, the DB target resource does include the specific instructions as to how to populate the types.
From the table "cds_inpatient", create an object of the class  "hospital inpatient entry" (which is a sub class of encounter) object , assign to variable @inpatient


==Map Request==
From the field "spell_number",  :   1. Create an object of class "hospital inpatient admission" (which is also a subclass of encounter) assign variable @admission
For a map to be provided it needs to be requested and it is the job of the IM mapping API to respond to a request.
[[File:Mapping Resource.jpg|thumb|Mapping resource with 3 map optoins]]
A mapping resource is used both as a request and a reference i.e. can be exported as a set of maps or mapped on request.


Note that as the mapping API is designed to be used as individual requests for individual values, this class does not inherit properties or classes in the source or target.
2. Create a relationship 'subcomponent of,' linking to the @inpatient object
A typical map request using the above example could be:


3. From the field 'administrative_category_code' create property of adminCategoryOnAdmission  (noting the context)


<div class="toccolours mw-collapsible mw-collapsed">
From the above map the client also knows to be able to use the  /Map/Barts/Cerner/CDS/Inpatient/accc context identifier with the MapConceptValue API to map the code 
An example request from a source system to a DB target
<div class="mw-collapsible-content">


<syntaxhighlight lang="JSON">
<syntaxhighlight lang="json">
  {"Mapping":{
{ "sourceContext": {
        "Request": "SourcetoDB",
  "context": "/CDE/CLEVE"
        "OriginalSource": {
  },
          "Provider": "Barts",
  "field": {
          "System": "CernerMillenium",
  "name": "Event_Code",
          "Context": {
  "value": "3562720",
            "id": 1,
  "term": "SARS Cov2 PMR",
            "Table": "AdmittedPatientCare",
  "field": {
            "Field": "PatientClassificationCode",
     "name": "Event_Result",
            "Value": 1
     "term": "Positive"
          }
   }
        },
  }
        "TargetDBSchema": "Compass_version1" } ] }
</syntaxhighlight></div></div>
 
==Bringing it all together==
 
A working example of the above is illustrated as a [[mapping working example]] for a client wishing to go from a source to a database target<syntaxhighlight lang="json">
{
"MapResultSet": {
  "Result": [
  {
     "Class": ":DM_MedicationRequest"},
  {
     "Property": ":DM_RequestedMedication" },
   {
    "ContextNode": ":GPMedRequests" },
  {
    "Value": ":SN_123300212120" } ]}}
</syntaxhighlight>
</syntaxhighlight>

Latest revision as of 11:55, 2 July 2020

Information model packages - mappings.png

The mappings package is one of the 5 main component categories in the information model.

The data map APIs are a set of APIs provided by the information model server to aid with the transfer of data between two data sets, where the source and target data sets conform to a data model. Either the source or the destination data models (or both) may be implemented as either relational or graph or both.

The APIs cover two main types of activities:

  1. The retrieval of mapping files for use by a transform module for the full transform
  2. The mapping of properties and values to aid with existing transforms.

The extent to which these are used by a transform module is determined by the degree of knowledge already built into a transform, and the nature of the transformation software. It may simply be used as a human readable documentation. It may be used to enhance a current transform. It may also be used to drive a generic transformation engine.

Background and use cases

The Discovery information model supports a consistent approach to the presentation of maps using a mapping syntax heavily influenced by the W3C R23ML language and the ORM JPA approach as used by open source systems such as Hibernate.

The use of a mapping server enables the separation of concern between the maintenance of data maps themselves and the applications that transform data. As well as the technical separation it separates the business of map production from the business of transformation.

The mapping server can be used by a transform module operating as a client via the mapping APIs.

Unlike many transform tools, the mapping server does not provide run time mapping code. The server assumes that the transform client is technically able to map X to Y, but may not know what the map actually is. Use of the mapping server enables the transform client module to operate using a generic mapping process as long as the module can support the technology that X and Y data exists in.

The main challenge in transforming data, is not so much the software, as the knowledge of what maps to what and how. This knowledge usually resides in the domain of informatics. The conventional approach of writing down this knowledge in documents or spreadsheets is helpful but insufficient. Instead it would be much better if this knowledge was presented in both a machine readable and human readable form. Even better, if the mappings can be used to parameterise a generic mapping application.

There are many different mapping techniques in operation. In particular object relational mapping tools based on this separation already exist and there are a scattering of standards in this area. Thus, The approach in the Discovery mapping service draws on the ideas inherent in the W3C R2RML and ORM open source systems such as Hibernate.

The APIs are designed to be used as an assistance to a code based ORM like transform, and can thus be used for some or all of the process, depending on the requirements of the transform client. In other words, the use of the APIs is a pick and mix approach, making it possible to introduce it into currently operating transformations without disruption.

The mapping API provides 3 levels of mapping

  1. Full schema mapping. The mapping server operates like a SatNav by providing both a map and a set of instructions as to how to use it. It nevertheless keeps the control of the process within the code space of the transform application. It uses similar constructs to the data set definition language i.e. a query syntax, albeit simplified.
  2. Property mapping. The server maps a class/ property or table/ field to a property concept. This is useful for things like FHIR when working out the nature of an extension or key value pair
  3. Value mapping. The means by which a legacy code or piece of text, is mapped to a core ontology concept, the mapping dependent on the context of the legacy code or term.

The above correspond to the 3 main APIs: Get map, Get mapped property, and get Mapped value.

Use of the two mapping APIs

There are 3 main uses of the get map API:

  • Relational to object mapping (MR2O) . Obtain a map which describes how to create or reference target objects from a set of source tables and fields, and how to map their values.
  • Object to relational mapping (MO2R). Obtain a map which describes how to create or reference a target database from a set of source objects and how to map their values.
  • Object to Object mapping (MO2O). Obtain a map which describes how to create or reference a target objects from a set of source objects and how to map their values.

The different approaches to the above maps is designed to address the problem of object relational impedance mismatch which is the problem of moving to and from rows/columns to types/subtypes;/properties.

The main use of the Get mapped property API is for assigning the common data model properties when processing legacy data from non standard sources.

There is one main use of the Get mapped value API which is for a codeable value in the source (e.g. a code or text), when taken together with the context of that value, obtain the common information model target concept (or code).

General approach to mapping

Choice of mapping steps

Discovery maps from a source only to the common information model's data model, and from the common data model to a target i.e. two mapping steps, often combined into one from the client perspective.

The benefit of this approach is that only one outbound map is required for each type of target (map from the common model) and only one inbound map for each source (map to the common model). The common data model acts as the hub that can "clean" and organise the data in a standardised way.

This approach also enables mapping clients to make use of either or both mapping steps separately or grouped depending on the use case. For example, currently implemented mapping process that map to and from FHIR, may only require a few additional IM facilities, such as what to do with new provider fields and what to put in a generic FHIR name/ value pair extension.

The approach to the Discovery syntax is in line with the information modelling language i.e uses class definitions to define the language syntax which provides explicit labelling of the constructs. In line with the modelling language a pure token based approach is also supported.

Use of Context

Context is used throughout the mapping model. Context indicates the context in which a particular data map artefact operates. The same map set in a slightly different context may operate in a different way. For example a mapping of X from system A from provider B, may be different from the mapping of X from system A from provider C. Context differentiates the two.

Context is used by the server in two ways:

  • As a means of identifying slight differences between seemingly similar maps i.e. a different context, thus avoiding the wrong map.
  • As a means of converging slightly different maps to a common mapping context i.e. initially different contexts converging to the same context to enable re-usability of maps.

Context is used by the client in order to short cut requests via the API. Having first defined context, it is then presented by the mapping server as a token for use by the the client, to enable re-use of the same context as a short cut to improved performance and to make coding easier. The context token is then an exchange token that tells the mapping server that a particular map request is operating within a particular context. Alternatively the client could send enough information each time to the server to provide the context for a map, so in this sense the context token is a convenience rather than a necessity.

There are two ways for the client to handle context:

  1. Provide the context with the API request. The mapping server reads the context and works out which mapping node to use to return the mapped property or value
  2. Provide the context identifier with the API request. The client already knows the context of a request and therefore can send it back in. Context operates as a sort of toke

Client provided context may use some or all of the full context information depending on whether the client knows the mapping server's mapping rules. For example, every GP practice using an EMIS Web system has the same context for all observations (as every EMIS GP system uses the same table structure. Thus the client does not need to send in the provider ID. However, if in doubt this can be sent in, as the mapping server will know to ignore it.

In this example the request shows the full context definition of a field from an "Admitted patient care" table which requires a value map for the '1' in the field "administration_category_code"

{
 "MapColumnValueRequest": {
  "Provider": {
   "CodeScheme": "\"https://fhir.nhs.uk/ValueSet/ods\"",
   "Value": "H123344",
   "Display": "BartsNHSTrust"
  },
  "System": {
   "CodeScheme": "https://DiscoveryDataService.org/InformationModel",
   "Value": "CernerMillenium"
  },
  "Schema": "CDS",
  "Table": "APC",
  "Column": "administrative-category_code",
  "Value": {
   "CodeScheme": ":CM_NHS-DataDictionary",
   "Value": 1
  }
 }
}

If the client knows the context the request could be as follows:

{
 "MapColumnValueRequest": {
  "Context":"Barts/Cerner/CDS/APC/administration_classification_code",
  "Value": {
   "CodeScheme": ":CM_NHS-DataDictionary",
   "Value": 1
  }
 }
}

The underlying approach to authoring the maps themselves is described in the article Map maker manager which describes the fundamental concepts involved in making the maps for the server to deliver.

Map requests

One of the two main map APIs involves requesting a map for the purposes of mapping from a source schema to a target schema.

The request - response exchange involves the use of JSON or XML conforming to the mapping syntaxe which is a subset of the the information model language. JSON or token based syntax is supported.

A request instance, like all the information model content, conforms to a subset of a single overarching modelling language. Thus a mapping document is both human and machine readable. I

The mapping server contains a repository of Maps, generated by the map server from a set of map nodes previously authored by a map maker.

Map Request

Map domain class

A map request is a request for map between a source schema and target schema. The combination of source description, mapping process and target definition for a full schema can be referred to as a map domain.

To request a map, a requester must first know the nature of the source and target they are requesting the map for. The description of the source and the target are both provided in structure that provides sufficient context for the mapping server to identify the source and target map required i.e. contains a set of identifiers that provide sufficient context for the map server to know what map to retrieve.

A map response contains a Domain map (class Map domain) which includes the source and target context for the map. The source and target terminals consist of identifiers for the source and target context properties.

A mapping request contains both source context and target context. If no target context is included, it is assumed that the target is the core information model.

The following is part of the body of a request for a mapping file for Barts Trust CDS schema to the information model

{
 "MapRequest": {
  "Provider": {
   "CodeScheme": "\"https://fhir.nhs.uk/ValueSet/ods\"",
   "Value": "H123344",
   "Display": "BartsNHSTrust"
  },
  "System": {
   "CodeScheme": "https://DiscoveryDataService.org/InformationModel",
   "Value": "CernerMillenium"
  },
  "Schema": "CDS",
  "Target" :"https://DiscoveryDataService.org/InformationModel/Core"
 }
}

If the client knows that the map is provider independent and they do not need to pass in the provider, then they need not populate the source provider property. If they do not know whether it is relevant or not, and they populate the property with the provider identifier, then the mapping server may choose to ignore that property if not relevant, and the response will not include the provider.

The mapping document response also contains the actual source and target map domain object as well as the relevant maps, together with the contexts of the source and target.

In the example the domain map context is between Barts CDS and FHIR Care connect

{
 "MapRequest": {
  "Context" :"Barts/Cerner/CDS"
  "Target" :"https://DiscoveryDataService.org/InformationModel/Core"
 }
}

If the client knows the map context it simply needs to pass in the source and target context as part of the request.

Map table column request

A mapping document contains one or more data maps between a source and a target. The source may be relational or in object form , and the target may be relational or object form, depending on the map domain parameters as described above.

The only difference in syntax between Object to object mapping and relational to object/ relational mapping is the terminology used. Tables columns, joins foreign keys, versus objects and properties and relationships.

In this example the client is requesting the map for a particular column from an admitted patient care column. The client already knows the context identifier for the schema and the table.

 "MapColumnRequest": {
"Context": "Barts/Cerner/CDS/APC",
  "Column": "administrative-category_code"
 }

Map column value request

This API returns a target value from a source value set in the context of a source provider source system, schema, table/class, field or property.

Typically this might be used by a transform client when ascertaining what to do with unknown source values that have not part of the mapping domain. Typically this is used when new codes are required or where the value is from a legacy classification or text.

In this example the client is seeking the mapped value for a field value of 1. The term is unknown (i.e. is not in a look up table from the source)

{
 "MapColumnValueRequest": {
  "Provider": {
   "CodeScheme": "\"https://fhir.nhs.uk/ValueSet/ods\"",
   "Value": "H123344",
   "Display": "BartsNHSTrust"
  },
  "System": {
   "CodeScheme": "https://DiscoveryDataService.org/InformationModel",
   "Value": "CernerMillenium"
  },
  "Schema": "CDS",
  "Table": "APC",
  "Column": "administrative-category_code",
  "Value": {
   "CodeScheme": ":CM_NHS-DataDictionary",
   "Value": 1
  }
 }
}

Sometimes, when mapping text or a value, it is necessary to provide additional context from another field.

For example, in a source system, the word 'negative, a value of the field 'result text' associated with a code value '12345', a value in the field 'test', which is a field of the table of 'clinical events', used by the system 'Cerner Millennium', in the hospital 'Barts NHS trust', may mean something completely different to 'negative' with the code '12345' set in another hospital, even with the same system.

In this case a dependent field value is sent with the value nested within the dependent field. In this example the mapping client is seeking the concept to represent a "positive" result for a test code.

{
 "MapColumnValueRequest": {
  "Context": "/CDE/CLEVE/ResultCode",
  "DependentColumnValue": {
   "Column": "Event_Code",
   "Value": {
    "Value": 3562720
   }
  },
  "Value": {
   "Value": "Positive"
  }
 }
}

Map Response

A data map uses a set of clauses in the order of CREATE FROM , WHERE which indicate that an object or property should be created from a source table or field, with perhaps with some other criteria, the transform will either create an object, property or relationship or reference an object already created previously in another process.

Data maps may be nested so that maps that have created objects as dependent (child objects) can go on to populate the properties of the child objects. Also CREATE clauses can have nested data maps in order to go on to populate objects created as targets of relationships.

Many maps are simple one to one maps. However, the following example shows a part of a map that takes a flat CDS table row and creates two encounters, one of which is a sub-encounter to the other.

"map": {
     "from": {
      "table": "cds_inpatient"
     },
     "create": {
      "context": "String",
      "object": {
       "class": " :DM_HospitalInpEntry",
       "idGenerator": {
        "auto": "true"
       }
      },
      "as": "@inpatient"
     },
     "map": [
      {
       "from": {
        "field": "spell_number"
       },
       "create": [
        {
         "object": {
          "class": ":CM_HospitalInpAdmitEncounter>",
          "idGenerator": {
           "auto": "true"
          }
         },
         "as": "@admission"
        },
        {
         "relationship": {
          "relationshipType": ":RM_isComponentOf",
          "referenceTarget": "@inpatient"
         }
        }
       ]
      },
      {
       "from": {
        "field": "administrative_category_code"
       },
       "create": {
        "context": "/CDS/APC/administrative-category_code",
        "property": {
         "name": "DM_adminCategoryonAdmission",
         "value": {
          "valueMap": [
           {
            "sourceValue": "01",
            "targetValue": ":CM_AdminCat01"
           },
           {
            "sourceValue": "02",
            "targetValue": ":CM_AdminCat02"
           },
           {
            "sourceValue": "03",
            "targetValue": ":CM_AdminCat03"
           },
           {
            "sourceValue": "04",
            "targetValue": ":CM_AdminCat04"}]}}}}]}}}

To walk through the above example, a set of instructions:

From the table "cds_inpatient", create an object of the class "hospital inpatient entry" (which is a sub class of encounter) object , assign to variable @inpatient

From the field "spell_number",  : 1. Create an object of class "hospital inpatient admission" (which is also a subclass of encounter) assign variable @admission

2. Create a relationship 'subcomponent of,' linking to the @inpatient object

3. From the field 'administrative_category_code' create property of adminCategoryOnAdmission (noting the context)

From the above map the client also knows to be able to use the /Map/Barts/Cerner/CDS/Inpatient/accc context identifier with the MapConceptValue API to map the code

{  "sourceContext": {
   "context": "/CDE/CLEVE"
  },
  "field": {
   "name": "Event_Code",
   "value": "3562720",
   "term": "SARS Cov2 PMR",
   "field": {
    "name": "Event_Result",
    "term": "Positive"
   }
  }