Discovery health information model: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
(No difference)

Revision as of 09:33, 25 October 2020

The Discovery common information model consists of an integrated set of 5 packages that support machine processable and human readable information describing data as held in the Discovery Data Service's data stores.

The model involves the modelling of the meaning of the data, the way the data is arranged for certain business processes, the way the data is queries, and the way implementations are mapped to and from the information model.

More specifically, in this context, the information model is tailored to illustrate arrangements of health and care data as held in health records for the purposes of understanding and query as described by the information model services.

The information model is built using the Discovery information modelling language, which is a standards based mixed language borrowing the main constructs from those languages as measure by fitness for purpose criteria.

Objectives of the Common Information model

The common information model is a set of components designed as a contribution to achieving the following objectives:

  • Enable people who are not technical experts to visualise and understand the structure and content of health records.
  • Enable people who are technical experts to design systems based on the logical structure and content of the model
  • Enable people to define the data they need in order to perform advanced analytics or decision support, in particular where the definition involves subsumption testing 
  • Enable query authors to have a library of value sets (sets of concepts) and query definitions for re-use across the health sector

The model is independent of implementation technology, i.e. is an abstract model, thus can be implemented in a technology of choice.

The model language is standards based, thus the model content can be exchanged using standards based messages.  However, Discovery also uses a JSON/XML based class based syntax that is easier to parse using object oriented programming languages, i.e. can be parsed to and from objects using any JSON parser and any language that supports classes. The outputs of the model can be relational or graph so they can easily be used by RDB or Graph based implementations.

At the core of the model is an ontology. The greater part of the ontology is based on the world leading health ontology Snomed-CT, which is itself now defined using the same ontology language which is part of the Discovery modelling language i.e. OWL2

 Information model component packages

Information model.png

The model involves an integration between different types of components linked together by  a common modelling language which is both human readable (albeit quite technical) and machine readable. The components work together, so that the generation of output or the query of content, usually uses several components. Each type of component can be conceptualised as a UML package.

Information model component relationships

Each of the components of interact with components from another part of the information model. The diagram on the right illustrates the main relationships between the components:

  • The information modelling language is the machine and human readable set of instructions as to what things mean, how they are classified, and class and properties.  See the language as instructions to populate a model.
  • The semantic ontology is the set of concepts used in all parts of the information model, from clinical concepts through to data structure concepts
  • The data model is a set of entities, attributes and value sets, all of which are defined precisely in the ontology, but he data model, being created for a specific business of healthcare is separate to the ontology.
  • Value sets , or concept sets, are business purposes specific collections of concepts from the ontology used in the data model or in query and contain concepts as defined in the ontology, using the ontology language,  including advanced concept classes.
  • Data set definitions apply rules and filters to a data model in order to specify the nature of the entries and their content required in a purpose specific data set
  • Model maps specify how data is transformed from a data model to a particular database or messaging format.
  • Data base schemas are reference schemas (RDB and maps) showing an implementation of a data model and data sets. Strictly speaking these are not part of the information model but are included as “proof of solution” of the model.
  • Query definitions are a library of re-usable queries.


Semantic ontology

The Discovery ontology defines the meaning of the concepts that make up the content of health records. The meaning is defined in a way that a computer can use to reason and analyse.

In reality the ontology is a semantic web of ontologies but in most cases the external ontologies are more accurately referred to as classifications or code schemes. 

The exception to the rule is the world leading Snomed-CT ontology which is now based on a  type of language known as Description logic and made available via three grammars, OWL2 and Snomed compositional grammar and Expression constraint language

Main ontology structures

The ontology is made of of a number of concepts (classes or properties) which are the subjects of axioms, which relate concepts to other concepts in a fractal like manner. The relationship can be illustrated as follows:

The ontology is precisely defined using the Discovery semantic ontology language, which is itself a syntactical simplification on the standard OWL2 language. The Discovery language exists in order to accommodate additional constructs not covered in OWL, namely data set definitions, value set definitions, and transactional messaging.


Core and legacy parts of the ontology

Relationship between core and legacy

The semantic ontology can be categorised into Core and Legacy concepts.

Core concepts are those concepts that have defined meanings, definitions being described by axioms which in Discovery have the form of OWL 2 axioms.

Legacy codes are classifications or local lists or various context based terms that are undefined except by their inferred position in a code hierarchy of some kind.

The Discovery ontology creates a relationship between core and legacy using a mapping relationship, the commonest being

  1. Equivalent. Where the legacy code or term is deemed to be equivalent in meaning and definition to the core concept
  2. Subclass. Where the legacy code or term is deemed to be subclass of the core concept
  3. Mapped to. Where the legacy code would be expected to be a member of the set defined by the core concept, but may not be sufficiently defined to be confident of equivalence or subclass.

From a mapping perspective the maps operate from Core -> Legacy and not the other way round. For example, if one were searching for Diabetes using a core concept, and a patient had a diagnosis of the ICD10 code "Diabetes without mention of complication" then one would expect that patient to be found (depending on the enquirers preference). However, if querying on "Diabetes without mention of complication" then no core concept would be found as the relationship does not go forward. The exception to this rule is the "equivalent" axiom which is bidirectional.

If the relationship between a core and a legacy is "equivalent" or "subclass", this does not mean that the child codes of the legacy codes would normally be included, as the child codes are often not subclasses from a semantic perspective. This is important to recognise when authoring queries using core concepts, operating on data that uses legacy codes.

Value sets or concept sets

Main article : Value_sets

A value set definition, and it's run time counterpart- value set transitive closure  , is a set of class expressions collected together for a particular business purpose.

There are a range of purposes for a value set. Examples range from defining a data set according to a set of recorded concepts, indicating the expected range of a property in a health record, or testing the presence of a feature in a patient record. 

Data model

Main article : Data Model

The data model is the part of the ontology that defines classes required for particular business purposes.

Business purposes vary from the need to store particular items of data through the need to display items in a certain way. This is the model that defines the ever evolving structure of health records held within multi-domain health records, varying from common high level classes through to specialised classes. An example of the former is an 'observation', and an example of the latter is a 'Blood pressure' or an 'histological/immunological report on a breast carcinoma'.

N.B. in IS013606 these are called archetypes and their derivative templates. In FHIR they are referred to as resources and profiles.

Data definitions - query

Data set definitions or queries are a key component of the information model.

A data set definition is a specification of a subset of data derived from one or more data models

A data set definition, once established, can also be used as a source data model and thus data sets can be chained by placing a data set into the role of a data model.

A data set uses query like constructs to define its structures. Data set entities and data set attributes may be derived from a combination of ontology and data model query. To that extent, a data set definition can be said to use a query language.

The Discovery data definition language is not designed to operate as an actual query language, as it does not extend to include all the sophistication needed by a run time query language. For example, there are no optimisation techniques employed or references to the use of indexes. However, the language is sufficiently rich to be able to easily generate SQL or Cypher from the specification when used with a data model map to the implementation schema.

Data Maps

Data maps hold the maps for a variety of purposes, mainly being:

  • Maps between the data model and an implementation schema to enable auto generation of query syntax such as SQL or CYPHER
  • Maps between legacy data models and or their values to the common information models

Data mapping APIs

The article Data mapping APIs describes the use of a mapping service to deliver pre-prepared maps and and actual map values for use by a transform application, and the article Map maker manager describes the processes involved making of maps for use by the mapping server.

Data map making

main article

Table Sub-type handling

Within the ontology, a data model entity may be a subclass of another data model entity, which means that the sub-entity is a more specialised class than its super entity. The sub-entity will have a number of business purpose specific attributes, defined according to the healthcare business they are used in.

Relational models do not directly support subclasses or inheritance. When implementing a data model entity in a relational or graph database, it is extremely unlikely that an implementer would create one table or one node type per entity. There would be potentially thousands, many of which would have identical properties and thus breach basic relational normalisation rules. Even where entities have slightly different attributes, it is unlikely that an additional table would be produced for each. More likely some form of extension capability such as a key value pair or triple table would be used.

Implementers would probably create a few high level tables to which many scores of sub-entities would map. A solution must be found the problem of  sub-classing in the data model versus database tables or node types.

Consider the following intuitive statements.

1.       An accident and emergency attendance is a type of health event.

2.       A blood pressure recording is a type of observation.

To indicate the sub type, most implementers would implement a field that dictated the type or code a ‘type field’ or ‘code field’ as FHIR does.  To apply this a special mapping property type ‘Sub Type property’ is defined in the data map.

This can be relied on to drive business logic when populating or querying a record. In the above example, the following could be stored in the actual record entry:

Table Field Value
Encounter type Accident and emergency
Observation type Blood pressure

The following could be found in map between the core information model and a relational implementation of the model.

 {"MapData": {
  "CreateTable": {
   "Context": "/Discovery/Compass/Encouter",
   "TableName": "Encounter",
   "Generator": {
    "IdColumn": "id",
    "From": {
     "ColumnName": [
      "encounter_id",
      "spell_number"
     ]
    }
   },
   "SubTypeColumn": {
    "ColumnName": "type",
    "Value": {
     "Concept": ":DM_HospitalInpEntry"
    }
   },
   "Alias": "enc1"
  },
  "FromObject": {
   "Class": ":DM_HospitalInpEntry"
  }
 }
}

This says that the encounter row represents a subtype of encounter - i.e. a hospital inpatient entry

On other words, the implementation map can make full use of the entity sub type hierarchy when mapping and searching the records,  knowing that subtypes are indicated using the mapped ‘Sub-type property’, and  can deduce the pattern of the data to be processed.