Discovery health information model

From Discovery Data Service
Jump to navigation Jump to search

Information modelling is the set of processes by which representations of data relationships are created maintained and queried.

The Discovery models are designed both for human visualisation and for computers to use.

Systems that use the models can use any or all of three approaches:

  1. Direct use of the model data content as a database (or set of files that can populate a database via script)
  2. Use via a set of APIs (both local and remote) designed to provide access to the data within the model, or to trigger outputs of the model for 1)
  3. Use of the information model technologies themselves via the use of the published open source code

Information model functions

The information Information models have 4 core functional requirements internal to the model: Description of the model , validation of model content, population of the model, and query of the model. In support of query there is also the need to support inference which generates new insights that were not necessarily authored.

In addition the information model must support the same 4 core functional requirements on actual health data that is modelled.

  • Description of the model. There is little point in having a model unless it can be described and understood. Knowing what is in a model is a pre-requisite to using it. For example, there is no point in trying to find out if a patient record indicates whether or not they have diabetes if the model doesn't include the ability to record it. In order to understand a model, two techniques are required: diagrammatic representation and human readable text representation. A model must support both.
  • Data Validation is essential for consistent business operations. Data models, user input forms, and data set specifications are designed to enable data collections to be validated. Maintaining a standard for data collection is essential. For example, if you have a patient record in front of you, you will likely need to know their approximate age. To work this out date of birth must be recorded. Validating that the date of birth can be and has been recorded is important. However, if more than one date of birth was recorded for the same patient, it would be less valuable. Thus a modelling language must include the ability to constrain data models to suit particular business needs.
  • Population of the model. It is impractical to build model content from scratch and likewise virtually impossible to populate instances with existing data without some manipulation. An information model must contain the ability to model mappings between currently held data and model conformant data.
  • Enquiry (or query) is necessary to generate information from data. There is little point in recording data unless it can be interrogated and the results of the interrogation acted upon. Thus a modelling language must include the ability to query the data as defined or described, including the use of inference rules to find data that was recorded in one context for use in another.
  • Inference is pivotal to decision making. For example, if you are about to prescribe a drug containing methicillin to a patient, and the patient has previously stated that they are allergic to penicillin, it is reasonable to infer that if they take the drug, an allergic reaction might ensue, and thus another drug is prescribed. Thus a modelling language must include the ability to infer things and classify things for safe decisions to be made

Model structure

A model must be built from some structure, using some tools or processes to build it. This section describes the nature of the structure that makes up the information model. The tools used to build the model includes the use of an information modelling language which is described separately.

IM main structural types as classes

A model must have a model i.e. a meta model that models the model i.e. the types of things a model is made up of.

The Discovery model can be described as an "Object Role Model (ORM) that includes an Ontology as one of the roles". It can also be described and implemented as a small number of main classes with each main class covering a role type.

Both perspectives

The roles themselves can be categorised into types. As one of the types of roles includes ontological axioms, this means that the model can operate both with the open world assumption (as required by the semantic web) and a closed world assumption (as required by the business of healthcare).

The main types are illustrated in the right hand image.

Interaction between the model and the external world is undertaken via the Discovery information model language, (or alternatively a set of W3C recommended languages) . These are described separately but consists of a language built from RDF triples applying the W3C language grammars and vocabularies of OWL2, SHACL, SPARQL, with support for GRAPHQL

The following sections briefly describe the various model classes illustrated above. The IM language specification provides more insight into the details of the classes.


Concept

All things that can be referenced via an identifier can be thought of as a concept. Even the classes and structures of the information model themselves are concepts.

A concept is defined as an ‘abstract idea’ or ‘general understanding of something’ and this meaning is preserved in the modelling language. It is one of the few abstract classes in the information model. This means that there is no actual object of 'type concept' unless it is also a type of some subtype of a concept.

Types of concepts include : Class, Property, Shape, Value Set, Data type, Query, Collection, term, and annotation. Each of these specialise in their function and properties and inherit the core properties of a concept and specialise by extension.

Use of aliases. Aliases enable properties and classes to be used in their alias form. See language specification for how context is used to provide aliases to enable key terms to be used in business processes without the inconvenience of using IRIs. Thus these sections use aliases for convention, the aliases themselves defined as aliases to concepts.

In this section aliases that are instantiated as a number of alternatives are enclosed in { } e.g. {axiom} in a class refers to subclass, equivalent, or disjoint

A concept  also comes with a fixed set of annotation properties that can be relied on to be present or have null values

Property alias Cardinality Type Description
iri 1 IRI an international identifier, the format as described within the language specification
status 1 Status type A status concept representing the status of the concept in terms if its activity status   e.g. Active or inactive
name      1 String This is the full name of the concept (or preferred term in Snomed-CT.) In OWL2 this is a label annotation
description 1 String  A plain language meaning of the concept, and how it may be used
version 1 integer The version in which this concept was first created
code 0..1 String  If the concept has a code, the code assigned to this concept by the original creator, e,g, a Snomed-CT, READ2, ICD10, OPCS or local code or auto generated code
scheme 0..1 IRI f the concept has a code, the code scheme assigned to this code, the scheme itself being an IRI
termKey 0..* String A number of keys used to link to the concept. Should not be confused with a term concept which is an alternative term linked to a concept
annotation 0..* Annotation Concepts may have additional informative simple string properties used for a variety of business purposes
alias 0..* String Aliases for this concept i.e. reserved terms within the context of a particular application that is implementing the information model and wishes to use aliases rather than the IRIs

Ontological Class

An ontological class is an extension of the concept class and is used as the main means for defining semantic concepts that are classes of objects for use in healthcare records.

The difference between an ontological class (often referred to as an owl class) and a simple concept is that it can be semantically defined by the use of class axioms. Class axioms such as subclass or equivalent classes are used for reasoning (inferencing and classification )and enable the information model to be queries using subsumption query.

Property alias Cardinality Type Description
type 1 Class A type of concept that is a class for the purposes of ontological definition
{axiom} 0..* Class axiom An axiom normally used to define a class e.g. Subclass, equivalent class, disjoint classes

Ontological property

An ontological property is an extension of the concept class and is used as the main means for defining semantic concepts that are used as properties or predicates. The difference between this and a class is that properties themselves cannot have properties. Nevertheless the use of property axioms to define properties makes them very powerful. Sub properties are included in subsumption tests on classes as well as linking properties that operate in reverse directions in a graph.

Property alias Type Cardinality Description
type 1 Property A type of concept that is a property or predicate, and used throughout the model. This includes most of the reserved tokens used in the IM classes and the IM language itself
{axiom} 1..* Property axiom An axiom normally used to define a property including domain, range, sub property, and whether transitive, or inverse of etc

Value set

This is a specialised class that defines and holds a collection of concepts, those concepts not necessarily being related by subclass relationships.

A value set member is a definition of a concept which is defined by a simple form of query called an expression constraint, which is a definition of a collection of classes as described below. A value set without members can be used as a means of inferencing subclass value sets.

A value set like any other class can be ontologically defined e.g. as a subclass of another value set and thus if a value set has members defined then the subclass members would be subsets of the superclass members. Conversely, when selecting a value set that has no members in a query, and that value set has subclasses, then the inference engine would include all the members of all of the subclasses.

Property alias Type Cardinality Description
type 1 "ValueSet" A type of concept that is used as a value set, a specialised class for defining concepts in a query
subClassOf 0..* Class Expression A value set may be a subclass of another value set
member 0..1 Expression constraint A specialised form of query that defines a collection of classes that would be subsumed when the query is run
expansion 0..* IRI A list of concept identifiers produced by inference from the member definition, or in the absence of a member definition, simply a list of concepts

Expression constraint

An expression constraint is a specialised query describing a set of concepts using class expressions and boolean logic . I.e. describes the attributes that a concept must have to be included or excluded from the set, using Boolean logic when necessary. Because ontological classes are defined as being things with certain properties and values or value types (attribute value pairs) then the definition can include simple constraints such as something being a subclass of another concept using inference.

An expression constraint is a String of one of the IM supported language grammars e.g. Discovery expression constraint, SPARQL fragment, Snomed-CT Expression constraint language

Collection

A collection is a constraint of a concept in that the concept type is one of the collection subtypes.

Collections subtypes are either lists or sets and lists may be ordered lists or unordered lists. Lists such as folders are used to initiate user navigation of the model. Collection contents have no inherent relationship with a collection concept. N.B Collections in this context should not be confused with the collection construct used in the language.

A collection is defined by its type e.g. a folder

Shape (data model)

A shape extends a concept and is the mainstay of the data modelling section of the information model.

A shape dictates the properties and values used in set of business oriented data stores i.e. defines and constrains the properties for particular purposes. A shape seems on the surface to be similar to a semantic class i.e. the properties described in a shape are all properties that one would expect to be properties of a class (e.g. date of birth as a property of a person). However, a shape is designed to be more prescriptive and "closed world". Consequently a shape can be used both to define a database schema, a message schema, and validate data content.

The shape constraint language is the major part of the modelling language and based on the W3C SHACL language.

A shape is a shape of something i.e. has a target class or a target properties and thus a shape is by default a class shape or property shape. The connection between a shape and a corresponding class (e.g. shape of a person) brings together the semantic ontology and the data modelling. Class shapes will contain property shapes which may be embedded in line rather than distinct.

Property alias Type Cardinality Description
type 1 Shape This is a shape class
{targetClass} 0..1 Class What the shape is a shape of. In health information model the alias 'record of' is likely to be used
targetSubjectOf 0..1 Property the predicate that this shape is the subject of
targetObjectOf 0..1 Property the predicate that this shape is the object of.

Classification

The models include modular classifications of concepts. The classification modules are either generated from the ontology via classifiers (which are functions of reasoners) or have been incorporated as handcrafted classifications . Examples of ontology generated classifications are Snomed-CT "ISA" hierarchy and the Discovery health classification. Examples of handcrafted classification modules are ICD10, Read. The main thing to note about the difference between the two is that concepts in classifications that are generated from an ontology subsume their descendants as proper subtypes whereas handcrafted classifications may include subcategories that are inconsistent.

To illustrate the difference between an ontology and a classification, Let us say that we state in an ontology that "ALL THINGS THAT BARK ARE DOGS".

Let us then say we go to the beach at Ravenscar on the North East coast of England and hear a bark. We see the animal at a distance. We ask the computer what it is. The computer, using the generated classification would classify that animal automatically as a DOG ( because it barks) .

However, as we get closer we see that it is something else. The ontology is clearly incorrect. Consequently we amend the ontology to state that there is such a thing as an "AN ANIMAL THAT BARKS" and that "A DOG IS A SUBCLASS OF AN ANIMAL THAT BARKS". We also state that such a thing exists as an "ANIMAL" and that "AN ANIMAL THAT BARKS IS A SUBCLASS OF AN ANIMAL". Now, when asking the computer what the animal is, the computer knows only that it is an animal that barks but does not know what it is.

We then amend the ontology to state that such a thing exists called a SEAL and that a "SEAL BARKS". We also author the ontology to say that "AN ANIMAL THAT BARKS IS EQUIVALENT TO A THING THAT BARKS" i.e. by definition if it barks it is an animal that barks. Now the computer automatically classifies the seal to state that "A SEAL "IS A" ANIMAL THAT BARKS" (because it is a thing that barks and must therefore be an animal that barks). It will then be found when searching for types of animals that bark, things that bark, and things that are animals i.e. the seal MUST be an animal because all things that barks are animals that bark and an animal that barks is a subclass of an animal. The human has not needed to find the category, the reasoner does it for you, it has automatically created the classification from the properties of the thing on the beach.


Information model storage architecture

IM logical object model.png

An information model is an abstract representation of data, but an information model must have content and that content must be stored.

Data cannot be stored conceptually, only physically, and thus there must be a relationship between the abstract model and a physical store.

In the information model services, the abstract model is instantiated as a set of objects of classes, the data element of those classes holding the subject, predicate and object structures. In reality those objects together with translation and data access methods are instantiated in some form of language. e.g. Java.

The physical store is currently held in a triple like relational database accessed by a relational database engine but could be easily stored as a native graph.

The model can then be used as the source and target of the exchange of data, the latter using a language interoperating via a set of APIs

This can be visualised as in the diagram on the right. It can be seen that the inner physical store, is accessed by an object model layer, which is itself accessed by APIs using modelling language grammar and syntax. The diagram shows the main grammars supported by the Discovery information model, including the Discovery information modelling language grammar itself.

Support for the main languages means that a Discovery information model instance has 2 levels of separation of concerns from the languages used to exchange data, and the underlying model store. There is thus no reason to buy into Discovery language to use the information model.

Likewise, an implementation of objects that hold data in a form that is compatible with a particular data model and ontology module, can be accessed using the same language.

This makes the language just as useful for exchanging query definitions, value sets as well as useful for actual query of health record stores via interpreters.

The remainder of this article describes the language itself, starting with some high level sections on the components, and eventually providing a specification of the language and links to technical implementations, all of which are open source.


Data models and value sets

Business domain data models are modules that define relationships in the context of a particular business or set of businesses and include the health data models.

A model is only relevant for a particular set of business purposes and here is no single model that can accommodate all business purposes, although common information models can accommodate quite broad purposes. A reasonably well understood set of business purposes is referred to in these topics as a "business domain" or "domain of interest" and a particular information model is designed to cover a business domain.

Examples of data model modules are the core Discovery health data model, and the PRSB core record model. Related message models such as FHIR profiles or openEHR archetypes are examples of business domain specific data models. Specialist data models may exist for particular business purposes such as cancer data set definitions or be more general, such as the Discovery common data model. The thing to note about the Discovery data models is that all concepts are defined in the semantic ontology.

The Discovery common data model (a broad model for each domain) will generally include data relationships needed by many domains, arranged in a way that inconsistency or unreliability is avoided.

Data models, also define the expected values of properties. Sometimes these values are class statements (e.g. has colour -> Colour meaning that the colour of something is a colour) and more often they are sets of concepts brought together for business purposes - value sets.

Query Library

A variation on a conventional ontology is that concept properties can also be defined according to functional definitions, as expressed in query language. The model contains a library of query definitions that like Data models, are usually business specific.

For example one would expect a record of a person's religion to be the concept of "Person's religion". In a data model this might be defined as " Person-> has religion -> Religion i.e. the value of the religion property of a person is a religion e.g. Hindu.

However, the same person may have many religions recorded about them during their lifetime. Thus the definition of a person's religion is more likely to be "the latest religion for this person" and that is a functional property.

A query is essentially a definition of a concept using both standard and functional properties, together with the related value sets with the addition of instructions as to what properties to return.

Ontologies and modules

The Discovery common information model can be thought of as an ontology of ontologies. More precisely though it should be considered as an ontology consisting of a set of ontology modules with each module defined according to business needs. The principle of concept sharing, whereby one concept is identified once across the entire set of domains, suggests that there is a single ontology. However a data model that is specified for a particular business purpose may have different class structures from another business purpose even though they share the same semantic definitions.

For example, take the idea of recoding information about a blood pressure. This is an example of a component in a data model. In General practice, it would be common practice to record a systolic and diastolic blood pressure and thus the component would consist of 3 classes. However, in a specialist research study involving different interpretations of blood pressures, including perhaps the size o nature of the cuff, or the exact position of the patient, this component may be more complex.

This is addressed by modularisation where the axioms that define the classes belong to a particular model, even though the property domains and their ranges are shared across the ontology. This is analogous to the idea of templates derived from subsets of archetypes. The difference is that there is no "super-archetype" requiring international agreement on the items in the archetype, but instead there is a demand that the same identifier of the diastolic blood pressure record class is used throughout, even though the class definition is business specific.