Mapping and matching concepts: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
No edit summary
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Historical background ==
== Mapping API ==
The information model mapping API is used, when given contextual data, to return both a property concept (representing the context) and a value concept (representing the value within that context). The structure of the mapping module allows multiple contexts to reference a common mapping node (for example when the same type of value is received from multiple sources) and also multiple value types to be referenced from a common mapping node.
 
The logic of the version 1 live mapping is described  more fully in the [[Information Model Mapping (v1)|model mapping article]].
 
== Managing codes and taxonomies ==
Information consists of ideas. Another word for an idea is a 'concept' . A concept may be named,( in which case the meaning of the concept can usually be understood), or they may be an unnamed expression, which is made up of a set of interrelated named or unnamed concepts.  
Information consists of ideas. Another word for an idea is a 'concept' . A concept may be named,( in which case the meaning of the concept can usually be understood), or they may be an unnamed expression, which is made up of a set of interrelated named or unnamed concepts.  


Line 8: Line 13:
A modern term based concept is defined in relation to other concepts by a set of assertions indicating whether the concept is equivalent to, or a subtype of, a set of other concepts. The standard approach to this is via the use of Description Logic (DL). By using DL, a computer can automatically classify a concept which can result in a computer deducing additional knowledge over and above the human who created the concept. Snomed-CT is the worlds largest ontology of healthcare term based concepts and is authored using DL. A collection of concepts defined in this way constitute an "Ontology" and there is a standard language OWL that is used to represent the definitions.
A modern term based concept is defined in relation to other concepts by a set of assertions indicating whether the concept is equivalent to, or a subtype of, a set of other concepts. The standard approach to this is via the use of Description Logic (DL). By using DL, a computer can automatically classify a concept which can result in a computer deducing additional knowledge over and above the human who created the concept. Snomed-CT is the worlds largest ontology of healthcare term based concepts and is authored using DL. A collection of concepts defined in this way constitute an "Ontology" and there is a standard language OWL that is used to represent the definitions.


Coded concepts originated from a different starting point. The intention of a coded entry is to ''pre-classify'' an entry before it is recorded. The code is designed for a particular set of business processes e.g. analytics or payment and it is important to understand the context in which a code has been used.  A coded concept, being pre-classified, relies on categorisation of the codes, and that classification may or may not imply that one code is a subtype of another. Nothing can be inferred from a code other than its relation to another code as authored.  Consequently, as the philosophy is different, code based concepts have to be dealt with differently from term based concepts, even if they seem to saying the same thing.
These types of concepts are referred to as "Core concepts"
 
The idea of codes originated from a different starting point. The intention of a coded entry is to ''pre-classify'' an entry before it is recorded. The code is designed for a particular set of business processes e.g. analytics or payment and it is important to understand the context in which a code has been used.  A coded concept, being pre-classified, relies on categorisation of the codes, and that classification may or may not imply that one code is a subtype of another. Nothing can be inferred from a code other than its relation to another code as authored.  Consequently, as the philosophy is different, code based concepts have to be dealt with differently from term based concepts, even if they seem to saying the same thing.
 
Because of their history, it is not always possible to assert the exact meaning of a code. However, it is often the case that meaning can be inferred or approximated from a coded entry. With  preference to move to an ontology, this inference can be achieved via the use of a mapping process that matches  coded concepts to term based concepts that are identified from a code.
 
These types of concepts are referred to as "legacy concepts"
 
There are two strategies to link legacy code concepts to core concepts.
 
1. A coded term may be stated confidently to be the same as, or a variation on, a concept. Typically code systems like Read2 or CTV3 can be dealt with in this way because they are designed to try and capture the idea in the clinicians mind, and they have been incorporated as concepts anyway. Likewise many system supplier codes have been created in this way. In this case the term code can be said to be a term code of the concept. Read2 G33 - Angina pectoris is a term code for the concept of angina pectoris.


Because of their history, it is not always possible to assert the exact meaning of a code based concept. However, it is often the case that meaning can be inferred or approximated from a coded entry. With  preference to move to an ontology, this inference can be achieved via the use of a mapping process that matches  coded concepts to term based concepts.
2. A coded term might be the same term as a concept but may have been entered without the assertion that is a true representation of a state. Typically code systems such as ICD10 and OPCS fall into this category. E11 - Diabetes type 2, seems to be the same as the concept of diabetes type 2, but was entered without clinician attestation and may have been approximated for payment purposes. In this case a legacy concept is produced and a map between this concept and the similar clinical concept is generated.


A statement of match is just another form of relationship, but unlike an ontological equivalent or subclass axiom it implies that the relationship is an approximation. It is a sort of statement that something is possibly or probably similar to something else and thus has much less weight than an asserted relationship.
A map is just another form of relationship, but unlike an ontological equivalent or subclass axiom it implies that the relationship is an approximation. It is a sort of statement that something is p''ossibly or probably similar to s''omething else and thus has much less weight than an asserted relationship.


Code based concepts can be mapped to term based concepts, and this enables the use of the vast volumes of data already recorded in systems. Maps must be used with care as it is almost always the case that the use of a mapped code in a query is dependent on the purpose of the query. This means that mappings are more of a guide to the things to include rather than a confident statement of meaning. When querying records the query author may need to determine which codes to include or exclude on a case by case basis.  
Legacy Code based concepts can be mapped to Core concepts , and this enables the use of the vast volumes of data already recorded in systems. Maps must be used with care as it is almost always the case that the use of a mapped code in a query is dependent on the purpose of the query. This means that mappings are more of a guide to the things to include rather than a confident statement of meaning. When querying records the query author may need to determine which codes to include or exclude on a case by case basis.


== Code relationships to term based concepts ==
== Maps between core concepts and legacy concepts ==
As mentioned above the relationships are managed as mappings which state the type or degree of match.  
As mentioned above the relationships are managed as mappings which state the type or degree of match.  


Maps generally fall into 4 patterns. These are illustrated in the context of code based concepts as follows:
Maps generally fall into 2 patterns. These are illustrated in the context of code based concepts as follows:


=== Simple match ===
=== Simple match ===
A concept may be matched to one other concept, the match having a certain weighting or category. For example the ICD10 code for Angina may have a map which maps to the single term based Snomed-CT concept of angina, with a category indicating that the source concept is properly classified. Note that many coded concepts may be mapped to one single term based concept. The map is viewed from the perspective of the coded concept.
A core concept may be matched to many code based concepts. In a simple match the legacy concept is deemed to be probably equivalent to, or a subclass of. the code concept<pre>


sn:194828000 |Angina (disorder)
    :matchedTo emis:G33 |Angina Pectoris|.
</pre>
=== Complex optional match ===
A concept may be matched to a number of alternative concepts and it is expected that a query author may wish to select these.


In this example, the concept ''':''' "Ketoacidotic coma due to diabetes mellitus (disorder)" has a complex map which is selection of either


N.B. In line with use of the health information modelling language based on RDF, Turtle syntax is used with the IRIs expanded by use of their name.<pre>
a) Coma unspecified
icd10:I209 |Angina Pectoris (ICD10 I20.9)|  
      :matchedTo [
        rdf:type sn:194828000 |Angina (disorder);
        :mapCategory sn:447637006 |Map source concept is properly classified
                ]
</pre>
i.e. The ICD10 code I20.9 is matched to a single Snomed-CT concept.


=== Complex union match ===
and
A concept may be matched to a number of alternative concepts and different categories of matching may apply. In other words more than one optional matching category and more than one optional target concept within each category. A union of concepts means "either, or, or both". In this example it is a union of unions<pre>
icd10:E140| Unspecified diabetes mellitus with coma
          //This maps to a number of potential  target concepts
  :matchedTo : [owl:unionOf
                [owl:unionOf
                    [rdf:type sn:26298008|Ketoacidotic coma due to.....],
                    [rdf:type sn:421725003|Hypoglycemic coma due to diabetes mellitus]
                ];
                :mapCategory  sn:447637006 |Map source concept is properly classified
                ],
                [
              rdf:type sn:267384006 |Coma due to hypoglycemia|;
              :mapCategory sn:447639009 |Map of source concept is context dependent
                ] 
              ].
</pre>


=== Complex intersection source match ===
bone of either Diabetes mellitus in pregnancy: Pre-existing diabetes mellitus, unspecified, or Diabetes mellitus in pregnancy, unspecified, or Diabetes mellitus arising in pregnancy
A combination of concepts may be matched to a single target concept (e.g. A and B matches C) implying that the meaning of C should include all of the source concepts.<pre>
owl:Intersectionof  
  [owl:UnionOf
    [ rdf:type opcs:H029| Unspecified other excision of appendix (OPCS49 H02.9);
      :matchAdvice "|ALWAYS H02.9 | ADDITIONAL CODE POSSIBLE"],
    [rdf:type opcs:H021 | Interval appendicectomy (opcs49 H02.1);
      :matchAdvice "ALWAYS H02.1 | ADDITIONAL CODE POSSIBLE"],
    [rdf:type opcs:H028  Other specified other excision of appendix(opcs49 H02]
  ],
  [owl:UnionOf
      [rdf:type opcs:Y752 | Laparoscopic approach to abdominal cavity NEC (opcs49 Y75.2)],
      [rdf:type opcs:Y755 |Laparoscopic ultrasonic approach to abdominal cavity (opcs49 Y75.5)]
  ]
];
:matchedTo [rdf:type sn:6025007 |Laparoscopic appendectomy (procedure)]
.
</pre>In other words a combination of one of the appendix excision OPCS codes and laparoscopic codes matches to the Snomed-CT concept of laparoscopic appendectomy. The matching objects also contain advice.


=== Complex intersection target match. ===
In effect meaning that the compound entry in the record would need to have 2 icd 10 codes to fulfill the criteria.
A concept may be matched with a high level of confidence to an intersection of target concepts i.e. a concept expression. If the level of confidence is high enough and the context known, this could also be asserted as an axiom.<pre>
<pre>
emis:ALLERGY6183BRIDL | Adverse reaction to Mercilon
sn:26298008
:matchedTo [owl:intersectionOf
  :hasMap [
              [rdf:type sn:281647001 |Adverse reaction (disorder)],
      :combinationOf [  
              [rdf:type : owl:Restriction;
                          :oneOf  icd10:R402 ]  
                owl:onProperty sn:246075003 | Causative agent (attribute);
                      [
                owl:someValuesFrom sn:9491701000001106|Mercilon (product)
                          :oneOf  icd10:O24.3 icd10:O24.9 O24.4]
              ]
</pre><br />
              ]
== Mapping source fields ==
.
It is equally common to need to map source fields to core data model properties. In the information model a property is considered just another concept.
</pre>It should be noted that in this case, in the knowledge that the original code was authored with an ontological definition in mind that the above could be represented as an equivalent i.e.<pre>
emis:ALLERGY6183BRIDL | Adverse reaction to Mercilon
  owl:equivalentClass
            [owl:intersectionOf
              [rdf:type sn:281647001 |Adverse reaction (disorder)],
              [rdf:type : owl:Restriction;
                owl:onProperty sn:246075003 | Causative agent (attribute);
                owl:someValuesFrom sn:9491701000001106|Mercilon (product)
              ]
            ]


.
However, to use mappings for source fields it is necessary to use the context in which the source fields exist.
</pre>


== Source resources properties and local codes ==
In the above examples, coded concepts were considered as context independent in the sense that the same code used by many providers and many systems would generally mean the same thing and can be treated the same way.
In the above examples, coded concepts were considered as context independent in the sense that the same code used by many providers and many systems would generally mean the same thing and can be treated the same way.


It is equally common to find provider and system specific constructs, including coded items whose meaning depends on the table or field within the source system. A similar approach to mapping of standard code schemes can be taken except that the source properties of the source concept must be explicitly described in order to provide context.


In the same way that codes can be mapped, so can source resource types such as tables or fields, message types or message segments. Mapping may involve functional transformation
In the same way that codes can be mapped, so can source resource types such as tables or fields, message types or message segments. Mapping may involve functional transformation


=== Defining source context ===
=== Defining source context ===
The first step in managing source concepts is to define the concept in the context of the originator of the data. This employs the use of a context object.<pre>
The first step in managing source concepts is to define the concept in the context of the originator of the data. This employs the use of a context object usually sent as parameters through the REST API. For example the following<pre>
Barts_cds_type_130_admin
 
      :hasSourceContext
{
        [:organisation :organisation/12345|Barts NHS Foundation Trust;
  "organisation": "Barts",
        :system :system/92223 | Cerner Millenium ;
  "system" : "CernerMillenium",
        :resource :table/cds_type_130;
  "message" :"cds_type_130",
          :field :field/admin_cat_code| administrative category code on admission
  "field" :"admin_cat_code"
          ];
}
      owl:equivalentClass nhsdm:administrative_category_code_on_admission.
 
</pre>
</pre>


=== Mapping nodes ===
=== Mapping nodes ===
A second step is to identify whether the source concept is equivalent to another source concept. This is done in order to rationalise the number of mappings steps needed between a source concept and the final target concept. For example:<pre>
A second step is for the mapping author  to identify whether the source context  is equivalent to another source contexts. This is done in order to rationalise the number of mappings steps needed between a source concept and the final target concept.
Barts_cds_type_130_admin
=== Matching to concept ===
      :hasSourceContext
The third step involves creation of a source concept and a core concept. In the above example the source is concept has been mapped to a core concept and the core concept is returned<pre>
        [:organisation :organisation/12345|Barts NHS Foundation Trust;
bc:BC_xyz
        :system :system/92223 | Cerner Millenium ;
  rdfs:label "admin_cat_code"
        :resource :table/cds_type_130;
 
          :field :field/admin_cat_code| administrative category code on admission
and potential core map
          ];
      owl:equivalentClass nhsdm:administrative_category_code_on_admission.
</pre>


=== Final matching step ===
im:administrative_category_code_on_admission
In the above example we may have a number of different providers each providing different files or concepts, whose context suggests a match with the NHS Data model. As the Discovery data service includes the NHS data model attributes as part of its core model, the NHS datamodel is then mapped to the Discovery model.<pre>
nhsdm:administrative_category_code_on_admission
           :matchedTo  
           :matchedTo  
              [rdf:type im:administrativeCategory| administrative category on admission]
          bc:BC_xysdasdasd.
 
 
</pre>The information model has fully defined the administrative category property as a property of a subclass of encounter dealing with hospital stays. Consequently the source system's table and field can be fully mapped to the common model field.
</pre>The information model has fully defined the administrative category property as a property of a subclass of encounter dealing with hospital stays. Consequently the source system's table and field can be fully mapped to the common model field.
<br />
== Legacy codes, terms and term codes==
Legacy or  local codes also require context and the same approach is used as described above. In this case a local code 12345655 mat have different meaning in a ddifferent system. A local code may or may not have any sort of code scheme (the scheme then being implied by the context). In the following example, Barts trust has its own local cerner code scheme.<pre>
{
  "organisation": "Barts",
  "system" : "CernerMillenium",
  "message" :"cds_type_130",
  "field" :"admin_cat_code",
  "codeScheme" :"BartsLocalCodes",
  "code" :123445556
}
</pre>
== Enumerated values ==
The NHS Data dictionary employs enumerated numeric values for many of its field contents.
These are treated as local codes using the context to create a field specific code scheme. In many cases local trusts have extended the national scheme (or even changed the scheme), and as a result mapping nodes are used for the common codes.
==Decision process for handling codes and terms==
When incorporating codes and terms into the ontology, there are 4 categories of approach, the selected approach being dependent on the semantic relationship between the legacy and core concept. The categories are:
#Creation of a local concept and creation of a simple "matched to" map between a core concept and a legacy concept. This implies that the legacy concept is a subtype of, or equivalent to the core concept. Examples of these are supplier local codes, Read 2, TPP
#Creation of a local or legacy concept and a complex "combination of" + some of or one of "mapped to" between a core concept and a legacy concept. This implies a more nuanced relationship meaning that the user may elect how to use the maps when querying records.
#Mapping of the local context object to a mapping node which maps to a core concept i.e. the core concept is used throughout the data store and the local context is replaced.
#Mapping of the local context object to a mapping node which maps to a legacy concept which may or may not map from a core concept
<br />

Latest revision as of 11:30, 27 September 2021

Mapping API

The information model mapping API is used, when given contextual data, to return both a property concept (representing the context) and a value concept (representing the value within that context). The structure of the mapping module allows multiple contexts to reference a common mapping node (for example when the same type of value is received from multiple sources) and also multiple value types to be referenced from a common mapping node.

The logic of the version 1 live mapping is described more fully in the model mapping article.

Managing codes and taxonomies

Information consists of ideas. Another word for an idea is a 'concept' . A concept may be named,( in which case the meaning of the concept can usually be understood), or they may be an unnamed expression, which is made up of a set of interrelated named or unnamed concepts.

For example the term "chest pain" implies the idea of a pain in the chest. In Snomed-CT it is a named concept. "Chest pain, worsened by exercise" may be an example of an expression style concept made up from the concept of "chest pain", and the statement that it is "made worse by -> exercise". In this case “made worse by” and “exercise” are both different concepts but no author has yet created a single named concept for this expression.

The new generation of health record management systems tend towards the recording of concepts, with the objective being for the record entry to closely match the idea behind the entry. These types of concepts can be called term based concepts as the term is the thing that describes the idea.

A modern term based concept is defined in relation to other concepts by a set of assertions indicating whether the concept is equivalent to, or a subtype of, a set of other concepts. The standard approach to this is via the use of Description Logic (DL). By using DL, a computer can automatically classify a concept which can result in a computer deducing additional knowledge over and above the human who created the concept. Snomed-CT is the worlds largest ontology of healthcare term based concepts and is authored using DL. A collection of concepts defined in this way constitute an "Ontology" and there is a standard language OWL that is used to represent the definitions.

These types of concepts are referred to as "Core concepts"

The idea of codes originated from a different starting point. The intention of a coded entry is to pre-classify an entry before it is recorded. The code is designed for a particular set of business processes e.g. analytics or payment and it is important to understand the context in which a code has been used. A coded concept, being pre-classified, relies on categorisation of the codes, and that classification may or may not imply that one code is a subtype of another. Nothing can be inferred from a code other than its relation to another code as authored. Consequently, as the philosophy is different, code based concepts have to be dealt with differently from term based concepts, even if they seem to saying the same thing.

Because of their history, it is not always possible to assert the exact meaning of a code. However, it is often the case that meaning can be inferred or approximated from a coded entry. With preference to move to an ontology, this inference can be achieved via the use of a mapping process that matches coded concepts to term based concepts that are identified from a code.

These types of concepts are referred to as "legacy concepts"

There are two strategies to link legacy code concepts to core concepts.

1. A coded term may be stated confidently to be the same as, or a variation on, a concept. Typically code systems like Read2 or CTV3 can be dealt with in this way because they are designed to try and capture the idea in the clinicians mind, and they have been incorporated as concepts anyway. Likewise many system supplier codes have been created in this way. In this case the term code can be said to be a term code of the concept. Read2 G33 - Angina pectoris is a term code for the concept of angina pectoris.

2. A coded term might be the same term as a concept but may have been entered without the assertion that is a true representation of a state. Typically code systems such as ICD10 and OPCS fall into this category. E11 - Diabetes type 2, seems to be the same as the concept of diabetes type 2, but was entered without clinician attestation and may have been approximated for payment purposes. In this case a legacy concept is produced and a map between this concept and the similar clinical concept is generated.

A map is just another form of relationship, but unlike an ontological equivalent or subclass axiom it implies that the relationship is an approximation. It is a sort of statement that something is possibly or probably similar to something else and thus has much less weight than an asserted relationship.

Legacy Code based concepts can be mapped to Core concepts , and this enables the use of the vast volumes of data already recorded in systems. Maps must be used with care as it is almost always the case that the use of a mapped code in a query is dependent on the purpose of the query. This means that mappings are more of a guide to the things to include rather than a confident statement of meaning. When querying records the query author may need to determine which codes to include or exclude on a case by case basis.

Maps between core concepts and legacy concepts

As mentioned above the relationships are managed as mappings which state the type or degree of match.

Maps generally fall into 2 patterns. These are illustrated in the context of code based concepts as follows:

Simple match

A core concept may be matched to many code based concepts. In a simple match the legacy concept is deemed to be probably equivalent to, or a subclass of. the code concept


sn:194828000 |Angina (disorder)
    :matchedTo emis:G33 |Angina Pectoris|.

Complex optional match

A concept may be matched to a number of alternative concepts and it is expected that a query author may wish to select these.

In this example, the concept : "Ketoacidotic coma due to diabetes mellitus (disorder)" has a complex map which is selection of either

a) Coma unspecified

and

b) one of either Diabetes mellitus in pregnancy: Pre-existing diabetes mellitus, unspecified, or Diabetes mellitus in pregnancy, unspecified, or Diabetes mellitus arising in pregnancy

In effect meaning that the compound entry in the record would need to have 2 icd 10 codes to fulfill the criteria.

sn:26298008
  :hasMap [
       :combinationOf  [ 
                           :oneOf  icd10:R402 ] 
                       [
                           :oneOf  icd10:O24.3 icd10:O24.9 O24.4]


Mapping source fields

It is equally common to need to map source fields to core data model properties. In the information model a property is considered just another concept.

However, to use mappings for source fields it is necessary to use the context in which the source fields exist.

In the above examples, coded concepts were considered as context independent in the sense that the same code used by many providers and many systems would generally mean the same thing and can be treated the same way.


In the same way that codes can be mapped, so can source resource types such as tables or fields, message types or message segments. Mapping may involve functional transformation

Defining source context

The first step in managing source concepts is to define the concept in the context of the originator of the data. This employs the use of a context object usually sent as parameters through the REST API. For example the following


{
  "organisation": "Barts",
  "system" : "CernerMillenium",
  "message" :"cds_type_130",
  "field" :"admin_cat_code"
}

Mapping nodes

A second step is for the mapping author to identify whether the source context is equivalent to another source contexts. This is done in order to rationalise the number of mappings steps needed between a source concept and the final target concept.

Matching to concept

The third step involves creation of a source concept and a core concept. In the above example the source is concept has been mapped to a core concept and the core concept is returned

bc:BC_xyz
  rdfs:label "admin_cat_code"

and potential core map

im:administrative_category_code_on_admission
           :matchedTo 
           bc:BC_xysdasdasd.

  

The information model has fully defined the administrative category property as a property of a subclass of encounter dealing with hospital stays. Consequently the source system's table and field can be fully mapped to the common model field.


Legacy codes, terms and term codes

Legacy or local codes also require context and the same approach is used as described above. In this case a local code 12345655 mat have different meaning in a ddifferent system. A local code may or may not have any sort of code scheme (the scheme then being implied by the context). In the following example, Barts trust has its own local cerner code scheme.

{
  "organisation": "Barts",
  "system" : "CernerMillenium",
  "message" :"cds_type_130",
  "field" :"admin_cat_code",
   "codeScheme" :"BartsLocalCodes",
  "code" :123445556
}

Enumerated values

The NHS Data dictionary employs enumerated numeric values for many of its field contents.

These are treated as local codes using the context to create a field specific code scheme. In many cases local trusts have extended the national scheme (or even changed the scheme), and as a result mapping nodes are used for the common codes.

Decision process for handling codes and terms

When incorporating codes and terms into the ontology, there are 4 categories of approach, the selected approach being dependent on the semantic relationship between the legacy and core concept. The categories are:

  1. Creation of a local concept and creation of a simple "matched to" map between a core concept and a legacy concept. This implies that the legacy concept is a subtype of, or equivalent to the core concept. Examples of these are supplier local codes, Read 2, TPP
  2. Creation of a local or legacy concept and a complex "combination of" + some of or one of "mapped to" between a core concept and a legacy concept. This implies a more nuanced relationship meaning that the user may elect how to use the maps when querying records.
  3. Mapping of the local context object to a mapping node which maps to a core concept i.e. the core concept is used throughout the data store and the local context is replaced.
  4. Mapping of the local context object to a mapping node which maps to a legacy concept which may or may not map from a core concept