Data map API

From Discovery Data Service
Jump to navigation Jump to search

This page is under review

Information model packages - mappings.png

The mappings package is one of the 5 main component categories in the information model.

The data map APIs are a set of APIs provided by the information model server to aid with the transfer of data between two data sets, where the source and destination data sets conform to a data model. Either the source or the destination data models (or both) may be implemented as either relational or graph or both.

The APIs cover two main types of activities:

  1. The retrieval of mapping files for use by a transform module client, set in the context of a known source type and destination type.
  2. The mapping of values between source and destination concepts terms or codes, set in the context of known source properties and destination properties obtained from the mapping files generated as above.

Background and use cases

The use of a mapping server enables the separation of concern between the maintenance of data maps and the application that transforms data.

The mapping server is used by a transform module operating as a client via the mapping APIs.

The mapping server does not have an application that generates run time mapping code. The server assumes that the transform client is able to map any X to any Y, but may not know what to map to what and what to do along the way. The transform client module can be designed to be generic in respect of X and Y, as long as it can support the technology that X and Y exists in.

The main challenge in transforming data, is not so much the software, as the knowledge of what maps to what and how. This knowledge usually resides in the domain of informatics. The conventional approach of writing down this knowledge in documents or spreadsheets is helpful but insufficient. Instead it would be much better if this knowledge was presented in both a machine readable and human readable form. Even better, if the mappings can be used to parameterise a generic mapping application.

The philosophy of the Discovery Information model is to form a bridge between the two technical worlds of informatics and programming in a way that none technical people can also visualise. Thus the approach to mappings has one level of abstraction from the technology to operate the mappings and, unlike many integration solutions is specifically designed not to impose a technology selection on the client.

The approach draws on the ideas inherent in the W3C R2RML and ORM open source systems such as Hibernate.

The APIs are designed to be used as an assistance to a code based ORM like transform, and can thus be used for some or all of the process, depending on the requirements of the transform client. In other words, the use of the APIs is a pick and mix approach, making it possible to introduce it into currently operating transformations without disruption.

Rather than simply providing a map, the mapping server operates like a SatNav providing both a map and a set of instructions as to how to use it, but keeps the control of the process within the code space of the transformation module. It uses similar constructs to the data set definition language i.e. a query syntax, albeit simplified.

There are 2 main APIs: Get map, and Get Mapped value.

Use of the two mapping APIs

There are 3 main uses of the get map API:

  • Relational to object mapping (MR2O) . Obtain a map which describes how to create or reference target objects from a set of source tables and fields, and how to map their values.
  • Object to relational mapping (MO2R). Obtain a map which describes how to create or reference a target database from a set of source objects and how to map their values.
  • Object to Object mapping (MO2O). Obtain a map which describes how to create or reference a target objects from a set of source objects and how to map their values.

The different approaches to the above maps is designed to address the problem of object relational impedance mismatch which is the problem of moving to and from rows/columns to types/subtypes;/properties.

There is one main use of the Get mapped value API

  • For a codeable value in the source (e.g. a code or text), when taken together with the context of that value, obtain the common information model target concept (or code).

General approach to mapping

Choice of mapping steps

Map to and from common model

Unlike a generic approach to mappings, which normally support many to many mappings, Discovery maps from a source only to the common information model's data model, and from the common data model to a target i.e. two mapping steps.

The benefit of this approach is that only one outbound map is required for each target (from the common model) and only one inbound map for each source (to the common model). The common data model acts as the hub that can "clean" and organise the data in a standardised way.

This approach also enables mapping clients to make use of either or both mapping steps separately or grouped depending on the use case. For example it would be quite common to use the output from a relational source to the common data model target, as a means of eventually mapping to FHIR, whether or not the mapping server supports the full relational to FHIR 2 step map. This is because perhaps a currently implemented mapping process that map to and from FHIR, may only require a few additional IM facilities, such as what to do with new publisher fields and what to put in a generic FHIR name/ value pair extension.

Context

Context is used throughout the mapping model. Context indicates the context in which a particular map operates. The same map set in a slightly different context may operate in a different way. For example a mapping of X, from system A from provider B may be different from the mapping of X from system A, from provider C. Context differentiates the two.

Context is used by the server in two ways:

  • As a means of identifying slight differences between seemingly similar maps i.e. a different context, thus avoiding the wrong map.
  • As a means of converging slightly different maps to a common mapping context i.e. initially different contexts converging to the same context to enable re-usability of maps.

Context is used by the client in order to short cut requests via the API. Having first defined context, it is then presented by the mapping server as a token for use by the the client, to enable re-use of the same context as a short cut to improved performance and to make coding easier. The context token is then an exchange token that tells the mapping server that a particular map request is operating within a particular context. Alternatively the client could send enough information each time to the server to provide the context for a map, so in this sense the context token is a convenience rather than a necessity.

The underlying approach to authoring the maps themselves is described in the article Map maker manager which describes the fundamental concepts involved in making the maps for the server to deliver.

Request mapping document

A mapping document is an information model document subtype returned by the Mapping server on request from a transform client via the mapping REST API.

The mapping server contains a repository of Mapping documents, previously authored by the map authors.

Map domain class

A requester must first know the nature of the source and target they are requesting the map for. The description of the source and the target are both provided in a Map start/end object that contains a set of identifiers that provide sufficient context for the map server to know what map to retrieve.


The following example request body requests the map between the Barts trust CDS Admitted patient care schema and the discovery information model

{ "fromSource":{
         "provider": {
              "value":"H12345",
              "display": "BartsNHSTrust",
              "codeScheme":"https://FHIR.nhs.uk/ods"},
          "system":{ 
              "value":":CM_System_CernerMillenium",
              "codeScheme":"https://DiscoveryDataService.org/InformationModel"},
          "schema": {
                "value": "CM_Schema_CDS-APC",
                "codeScheme": "https://DiscoveryDataService.org/InformationModel"}},
   "toTarget":{
        "system": {
              "value":":CM_System_DiscoveryCore", 
               "codeScheme": "https://DiscoveryDataService.org/InformationModel"},
         "schema": {
               "value":"CM_Schema_DiscoveryInformationModel_V1",
                "codeScheme": "https://DiscoveryDataService.org/InformationModel"}} }

The mapping document response also contains the actual source and target map domain object as well as the relevant maps.

If the requester knows that the map is provider independent and they do not need to pass in the provider, then they need not populate the source provider property. If they do not know whether it is relevant or not, and they populate the property with the provider identifier, then the mapping server may choose to ignore that property if not relevant and the response will not include the provider.

The mapping document, when returned, also confirms the source and target and in addition a context IRI that can be used for future calls to the API.

{"contextIRI":":MDC_BartsAPCIM"}

Maps

A mapping document contains one or more maps between a source and a target. The source may be relational or in object form , and the target may be relational or object form depending on the map domain parameters as described above.


Context dependency

For a source value to be understood, it is necessary to provide some context to it. All values are therefore set in some form of context. As a minimum this context would be the source property i.e. the property for which the value is a value of. In many cases though, the context will be much more extensive.

For example, in a source system, the word 'negative, a value of the field 'result text' associated with a code value '12345', a value in the field 'test', which is a field of the table of 'clinical events', used by the system 'Cerner Millennium', in the hospital 'Barts NHS trust', may mean something completely different to 'negative' with the code '12345' set in another hospital, even with the same system.

A further layer of context includes the Domain in which the mapping takes place. A map generated for one purpose may be different when generated for another purpose.

For example, when process

Two mapping routes

ing published data into Discovery, the domain in question could be described as the 'inbound publisher mapping domain'.

Consequently, the implicit idea of a context and a domain, is explicitly modelled as a 'Domain mapping node' object. The class, and the various supporting classes are described in the following sections.

DB Schema class

Before doing any mappings, it is necessary to model a target schema in order to map to it.

Implementation schema resources are a set of objects of the class DBSchema (to the right)

The class is designed as a simple entity relationship class with 2 additional properties:

  1. The name of the table's extension tables. These are optional triple tables designed so that a schema can continue to extend to additional properties and values using the information model to determine the properties and data types. This avoids the need to continually change the relational schema with new data items.
  2. The name of the field holding the subtype indicator. This is described as the entity subtype attribute.

The following is an example of a snippet from an encounter table:

A schema table example showing extension table and subtype field

{"DBSchema": {
      "DBSchemaName": "Compass_version_1",
      "DBTable": {
        "DBTableName": "encounter",
        "DBExtensionTable": {
          "DBTableName": "encounter_extension"
        },
        "DBSubTypeField": "type" } } }

The encounter table is expecting subtypes to be authored and therefore has a "subtype" field authored in the table in order to avoid generating many subtype tables.

Original source Resources

Every map has a source and target. From the perspective of the information model an 'original source' represents a data model created from a publisher's source data i.e. is likely to be a relational or json representation of publisher data that might have been originally delivered as HL7 V2, XML, JSON, CSV or pipe delimited flat files. Source resources are therefore not representations of the actual data, but representations of a model that would be used when transforming to the common model. An example of this is a staging table.

Source resource description

It is assumed that a source may contain many tables, each with many fields, each with many values including text. It is not necessary for there to be actual tables, and fields and any object structure, masquerading as such, can be used. The terms 'table' and 'field' are used for convenience and refer to objects and properties just as well.

There many be differences between one provider and another using the same system, and different versions of the system. Thus there is a need to provide context for each source resource.

Each element of source data must explicitly inherit the context so that the mapping API can recognise the context with each request.

The Original source resource object reflects a single logical thing to map. In most cases this will be a single field and single value. However, in some cases (such as free text sources), the source is derived from a list of fields, each with certain values.


For example, a piece of text saying "negative", when contextualised as a result against a test for 'Hepatitis B surface antigen' would use compound context consisting of the table, the test field, the test code, the result field and the result text of negative.

An example original source object from a CDS admitted patient care record with a value of '1' for the admitted patient classification

 { "OriginalSource": {
          "Provider": "Barts",
          "System": "CernerMillenium",
          "Context": {
            "id": 1,
            "Table": "APC",
            "Field": "PATIENT_CLASSIFICATION_CODE",
            "Value": 1 } }

The original source resource would be used as part of a mapping request submitted via the API

Information model target resource

The common information model target resource is the target of a map from an original source.

The resource is delivered as part of the IM mapping API response.

Target resource for a map

The target resource indicates the class and properties of the target object, the object may need creating or adding to with the target property, and the target value.

In some cases multiple sources may link to one target and in other cases multiple targets may link to one source, the link item being the "from id" link fro the target.


An example target resource from a request containing an original source object

  {"IMTarget": {
          "Fromid": 1,
          "Class": ":CM_HospitalInpAdmitEncounter",
          "DependentRelationship": {
            "Relationship": ":RM_isComponentOf",
            "Class": ":DM_HospitalInpEntry"
          },
          "PropertyValue": {
            "Property": ":DM_admissionPatientClassification",
            "Value": ":CM_AdmClassOrdinary"  } }}

The class, properties and values of the IM resource all reference IM concepts.

This resource says that the target IM class is a "hospital inpatient admission encounter type". This object is dependent on the presence of a container encounter, in this case of type "hospital in patient stay" and the relationship between them is 'is subcomponent of' . The field value source results in the property of 'admission patient classification; and the value being 'Ordinary admission'.

Note that the information model resource uses subtypes as classes, in line with the ontology. It avoids the complexity involved in populating database schemas. However, the DB target resource does include the specific instructions as to how to populate the types.

Map Request

For a map to be provided it needs to be requested and it is the job of the IM mapping API to respond to a request.

Mapping resource with 3 map optoins

A mapping resource is used both as a request and a reference i.e. can be exported as a set of maps or mapped on request.

Note that as the mapping API is designed to be used as individual requests for individual values, this class does not inherit properties or classes in the source or target. A typical map request using the above example could be:


An example request from a source system to a DB target

 {"Mapping":[   {
        "Request": "SourcetoDB",
        "OriginalSource": {
          "Provider": "Barts",
          "System": "CernerMillenium",
          "Context": {
            "id": 1,
            "Table": "AdmittedPatientCare",
            "Field": "PatientClassificationCode",
            "Value": 1
          }
        },
        "TargetDBSchema": "Compass_version1" } ] }

Bringing it all together

A working example of the above is illustrated as a mapping working example for a client wishing to go from a source to a database target

{
 "MapResultSet": {
  "Result": [
   {
    "Class": ":DM_MedicationRequest"},
   {
    "Property": ":DM_RequestedMedication" },
   {
    "ContextNode": ":GPMedRequests" },
   {
    "Value": ":SN_123300212120" } ]}}