Data map API

From Discovery Data Service
Revision as of 09:12, 27 May 2020 by DavidStables (talk | contribs)
Jump to navigation Jump to search

The mappings package is one of the 5 main component categories in the information model.

Information model packages - mappings.png

For the information model to be even more useful than simply a reference, it is helpful to map the constructs in the core information model to concrete implementations of data bases that hold data.

These maps support the automation of data population and retrieval, either via generation of the implementation specific data manipulation language (e.g. SQL or CYPHER), or by the parameterisation of application functions used by code.

Maps (or mappings) require three main types of resource:

  1. A source resource. i.e. the thing that is mapped from
  2. Intermediate data manipulation rules that operate on the data between source and target
  3. Target resource. i.e. the thing that is mapped to from the source


Mapping pipeline

As the information model takes data from many sources and maps to an abstract common model. The core databases in Discovery itself represents the common model via a set of implementation specific schemas. Thus the core Discovery data schemas are themselves mapped from the core model.

Thus mappings may be used both by subscriber databases and publisher databases and both inside and outside of Discovery.

Furthermore, it can be seen from the logical 2 step mappings that it is equally practical for clients to consider a direct map from source to destination knowing that it has mapped to the common model as part of the process. This contrasts this style of mapping to conventional integration mappings that map from many to many directly.

To more easily demonstrate how mappings work, there is a working example showing a walk through of the use of the mapping API using the resource examples illustrated below

Implementation schema resources

DB Schema class

Before doing any mappings, it is necessary to model a target schema in order to map to it.

Implementation schema resources are a set of objects of the class DBSchema (to the right)

The class is designed as a simple entity relationship class with 2 additional properties:

  1. The name of the table's extension tables. These are optional triple tables designed so that a schema can continue to extend to additional properties and values using the information model to determine the properties and data types. This avoids the need to continually change the relational schema with new data items.
  2. The name of the field holding the subtype indicator. This is described as the entity subtype attribute.

The following is an example of a snippet from an encounter table:

A schema table example showing extension table and subtype field

{"DBSchema": {
      "DBSchemaName": "Compass_version_1",
      "DBTable": {
        "DBTableName": "encounter",
        "DBExtensionTable": {
          "DBTableName": "encounter_extension"
        },
        "DBSubTypeField": "type" } } }


Original source Resources

Every map has a source and target. From the perspective of the information model an 'original source' represents a data model created from original source data i.e. is likely to be a relational or json representation of publisher data that might have been delivered as HL7 V2, XML, JSON, CSV or pipe delimited flat files. Source resources are therefore not representations of the actual data, but representations of a model that would be used when transforming to the common model.

Source resource description

It is assumed that a source may contain many tables, each with many fields, each with many values including text.

Furthermore, there many be differences between one provider and another using the same system, and different versions of the system.

Thus there is a need to provide context. Each element of source data inherits the context so that the mapping API can recognise the context with each request. The Original source resource object reflects a single logical thing to map. In most cases this will be a single field and single value. However, in some cases (such as free text sources), the source is derived from a list of fields, each with certain values.

For example, a piece of text saying "negative", when contextualised as a result against a test for 'Hepatitis B surface antigen' would use compound context consisting of the table, the test field, the test code, the result field and the result text of negative.

An example original source object from a CDS admitted patient care record with a value of '1' for the admitted patient classification

 { "OriginalSource": {
          "Provider": "Barts",
          "System": "CernerMillenium",
          "Context": {
            "id": 1,
            "Table": "APC",
            "Field": "PATIENT_CLASSIFICATION_CODE",
            "Value": 1 } }

The original source resource would be used as part of a mapping request submitted via the API

Information model target resource

The common information model target resource is the target of the map from the original source as is delivered as part of the IM mapping API response.