Discovery health information model

From Discovery Data Service
Revision as of 12:04, 25 May 2020 by DavidStables (talk | contribs)
Jump to navigation Jump to search

The Discovery common information model consists of an integrated set of 5 main types of components that support machine processable and human readable information about data in the Discovery Data Service.

The model has a much broader definition than simply a data model. It involves the modelling of the meaning of the data and the way in which data is converted to information via query

More specifically, in this context, the information model is tailored to illustrate arrangements of health and care data as held in health records for the purposes of understanding and query as described by the information model services.

The information model is built using the Discovery information modelling language, which is a standards based mixed language (which could be considered a meta language).

 

Objectives of the Common Information model

The common information model is a set of components designed to contribute to achieving the following objectives:

  • Enable people who are not technical experts to visualise and understand the structure and content of health records.
  • Enable people who are technical experts to design systems based on the logical structure and content of the model
  • Enable people to define the data they need in order to perform advanced analytics or decision support, in particular where the definition involves subsumption testing 
  • Enable query authors to have a library of value sets and query definitions for re-use across the health sector

The model is technology and system independent, thus can be implemented in a technology of choice.

The model language is standards based, thus the model content can be exchanged using standards based messages.  However, Discovery also uses a simpler JSON based syntax that is easier to comprehend and parse using object oriented programming languages. In addition the outputs of the model can be relational so they can easily be used by RDB based implementations.

At the core of the model is an ontology. The greater part of the ontology is based on the world leading health ontology Snomed-CT, which is itself now defined using the same ontology language OWL2

 

Information model component types

The model involves an integration between different types of components linked together by  a common modelling language which is both human readable (albeit quite technical) and machine readable. The components work together, so that the generation of output or the query of content, usually uses several components.

This diagram illustrates the main types of components of the model:

Information model components.png
  • The information modelling language is the machine and human readable set of instructions as to what things mean, how they are classified, and class and properties.  See the language as instructions to populate a model.
  • The semantic ontology is the set of concepts used in all parts of the information model, from clinical concepts through to data structure concepts
  • The data model is a set of entities, attributes and value sets, all of which are defined precisely in the ontology, but he data model, being created for a specific business of healthcare is separate to the ontology.
  • Value sets are business purposes specific collections of concepts from the ontology used in the data model or in query and contain concepts as defined in the ontology, using the ontology language,  including advanced concept classes.
  • Data set definitions apply rules and filters to a data model in order to specify the nature of the entries and their content required in a purpose specific data set
  • Data model maps specify how data is transformed from a data model to a particular database.
  • Data base schemas are reference schemas (RDB and maps) showing an implementation of a data model and data sets. Strictly speaking these are not part of the information model but are included as “proof of solution” of the model.
  • Derived attributes are data model attributes defined using the query language.
  • Query definitions are a library of re-usable queries.

Modelling components

This section describes the elements of the information model that deal with the types and processes of modelling rather than the content of the model itself. This section is the starting point for those wishing to understand how the Discovery common information model differs from other information models and how the approaches to the modelling use and adapt standards and other initiatives. 

Semantic ontology

The Discovery ontology defines the meaning of the concepts that make up the content of health records. The meaning is defined in a way that a computer can use to reason and analyse.

In reality the ontology is a semantic web of ontologies but in most cases the external ontologies are more accurately referred to as classifications or code schemes. 

The exception to the rule is the world leading Snomed-CT ontology which is now based on a  type of language known as Description logic and made available via three grammars, OWL2 and Snomed compositional grammar and Expression constraint language

Main ontology structures

The ontology is made of of a number of concepts (classes or properties) which are the subjects of axioms, which relate concepts to other concepts in a fractal like manner. The relationship can be illustrated as follows:

The ontology is precisely defined using the Discovery semantic ontology language, which is itself a syntactical simplification on the standard OWL2 language. The Discovery language exists in order to accommodate additional constructs not covered in OWL, namely data set definitions, value set definitions, and transactional messaging.



Value sets

Main article : Value_sets

A value set definition, and it's run time counterpart- value set transitive closure  , is a set of class expressions collected together for a particular business purpose.

There are a range of purposes for a value set. Examples range from defining a data set according to a set of recorded concepts, indicating the expected range of a property in a health record, or testing the presence of a feature in a patient record. 

Data model

The data model is the model that defines the ever evolving structure of health records held within Discover, built as a common model encompassing.

The Data models cover the structural arrangements of care record entities (tables or nodes), attributes (fields or foreign keys or properties/ relationships) and their ranges, which might be other entities, simple data types or concept instances.

The model is built using a simple pragmatic property graph supporting Data modelling language

Most people consider data structures from a relational or graph perspective i.e. by the application of set theory (relational) or graph theory (graph databases). The two approaches are almost logically interchangeable in that entities are considered as nodes or vertices, and entity relationships are considered as edges or relationships.

There are some similarities between a data model and the ontology and some people consider both to be ontologies. The most significant similarity is that they both use the ideas of class, property and property value, albeit using different naming conventions for these things.

Within Discovery the ontology defines concepts that represent sets of objects with certain properties and value ranges from the context of meaning. A data model describes classes and properties and value ranges arranged for business purposes.  Both involve collections of class/ property/ value ranges. Data models tend to use the term attributes to encompass properties and their value types, or relationships that link classes, for convenience.

Whilst there is a clear distinction around purpose, there is overlap between the two components. The overlap occurs because the concepts used within a data model have their meaning defined within the ontology. There is a hand over between the two that occurs when a business process such as a record query, takes account of both.

Model content

main article Discovery health data model

In line with the information model itself, the model content is constantly evolving. These articles provide an overview of content at a particular point in time, as high level background. There is likely to be a discrepancy between the documentation on this wiki and the information model itself. The model content is best viewed via the information model viewer, although the viewer itself may also reference the wiki.

Data Set definitions

Data set definitions are a key component of the information model.

A data set definition is a specification of a subset of data derived from one or more data models

A data set definition, once established, can also be used as a source data model and thus data sets can be chained by placing a data set into the role of a data model.

A data set uses query like constructs to define its structures. Data set entities and data set attributes may be derived from a combination of ontology and data model query. To that extent, a data set definition can be said to use a query language.

The Discovery data definition language is not designed to operate as an actual query language, as it does not extend to include all the sophistication needed by a run time query language. For example, there are no optimisation techniques employed or references to the use of indexes. However, the language is sufficiently rich to be able to easily generate SQL or Cypher from the specification when used with a data model map to the implementation schema.

Data Maps

Data maps hold the maps for a variety of purposes, mainly being

  • Maps between the data model and an implementation schema to enable auto generation of query syntax such as SQL or CYPHER
  • Maps between legacy data models and or their values to the common information models