Collaborative Glossary: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
No edit summary
 
(41 intermediate revisions by 2 users not shown)
Line 5: Line 5:
!Definition
!Definition
!External Links
!External Links
|-
|Canonical Data Model
|This is a type of data model that presents data entities and relationships in the simplest possible form. It is a design pattern which allows communication between different data formats. ''Canonical'' form describes the simplest representation of an object.
|[[wikipedia:Canonical_model|Canonical model - Wikipedia]]
|-
|-
|Categories
|Categories
Line 26: Line 30:
It defines the transformation of data from multiple different databases with potentially multiple coding systems into a common storage format (data model) as well as a common representation (terminologies, vocabularies, coding schemes)
It defines the transformation of data from multiple different databases with potentially multiple coding systems into a common storage format (data model) as well as a common representation (terminologies, vocabularies, coding schemes)
|[https://www.jamesserra.com/archive/2019/06/common-data-model/ Microsoft Example]
|[https://www.jamesserra.com/archive/2019/06/common-data-model/ Microsoft Example]
|-
|Common Data Model (Discovery)
|The Discovery Common Data Model is the schema of the Discovery Compass v2 databases which exist in the Discovery Core and are held as subscriber databases within ICS environments.  The schema is [https://wiki.discoverydataservice.org/index.php?title=Remote_Subscriber_Database_(RSD)_Schema_(Compass_2) here].....  The schema is thought to be based on EMIS primary care system schema.
|
|-
|Compass Database (Compass v2)
|The database schema used in "subscriber" databases in DDS. This is often referred to as the DDS Common Data Model.  There are currently 3 versions of the Compass database within the Discovery AWS core, one for each ICS - NEL, SEL and NWL.  These are still referred to as "Subscriber" databases by original DDS management team.  There are four "Compass" databases outside the AWS core.  One with NEL CEG, one with NEL ICS, one with SEL ICS and one in NWL WSIC infrastructure.
|[[Compass 2 Schema Mappings]]
[[Compass Database Mappings]]
|-
|-
|Concept
|Concept
Line 34: Line 47:
|This is a high level model, describing business processes and how they relate to one another. In simple terms, it is a map of concepts and rules relating to the business which can then be used to define relationships between data entities in order to describe the business concepts – this leads to the creation of a logical data model
|This is a high level model, describing business processes and how they relate to one another. In simple terms, it is a map of concepts and rules relating to the business which can then be used to define relationships between data entities in order to describe the business concepts – this leads to the creation of a logical data model
|
|
|-
|Customer Managed Keys
|Customer Managed Keys are security keys that are created, owned, and managed by the customer. The customer has full control over enabling, disabling, key policies, rotating cryptographic material, scheduling for deletion. In simple terms, this means a supplier may hold the customer's data in their system (software as a service, for example) which is encrypted by the customer managed key. The supplier is unable to decrypt the data. Only the customer can do so as they hold the key.
|[https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk Customer Managed Keys - AWS]
[https://learn.microsoft.com/en-us/azure/security/fundamentals/key-management Key Management in Azure]
|-
|Data Controller or simply Controller
|Data controller means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law (Definition from Article 4 (7) GDPR)
|[https://www.privacy-regulation.eu/en/article-4-definitions-GDPR.htm Controller]
|-
|-
|Data Dictionary
|Data Dictionary
|A centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format e.g. NHS Data Dictionary.
|A centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format e.g. NHS Data Dictionary.
|[[wikipedia:Data_dictionary|Data Dictionary]]
|[[wikipedia:Data_dictionary|Data Dictionary]]
|-
|Data Lake
|A '''data lake''' is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). A data lake can be established "on premises" (within an organization's data centers) or "in the cloud" (using cloud services from vendors such as Amazon, Microsoft, or Google).
Poorly-managed data lakes have been facetiously called data swamps.
|[[wikipedia:Data_lake|Data Lake]]
|-
|Data Mart
|A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area. Data marts make specific data available to a defined group of users, which allows those users to quickly access critical insights without wasting time searching through an entire data warehouse
|[https://www.ibm.com/cloud/learn/data-mart#:~:text=A%20data%20mart%20is%20a,through%20an%20entire%20data%20warehouse. Data Mart]
|-
|-
|Data Model
|Data Model
Line 45: Line 76:
|-
|-
|Data Platform
|Data Platform
|A Data Platform is any digital solution which enables an organisation to store, transform, aggregate, and analyse data. It integrates many different technologies for different purposes to meet the data needs of the organisation. It includes security and access controls to ensure IG compliance
|A Data Platform is any digital solution which enables a single organisation to store, transform, aggregate, and analyse data. It integrates many different technologies for different purposes to meet the data needs of the organisation. It includes security and access controls to ensure IG compliance
|
|
|-
|-
Line 63: Line 94:
|A data warehouse stores data for the main purpose of reporting and data analysis. They are central repositories which bring together disparate data sets into one place, to enable reporting at scale and pace.  
|A data warehouse stores data for the main purpose of reporting and data analysis. They are central repositories which bring together disparate data sets into one place, to enable reporting at scale and pace.  
|[[wikipedia:Data_warehouse|Data Warehouse]]
|[[wikipedia:Data_warehouse|Data Warehouse]]
|-
|Dataset
|A '''data set''' (or '''dataset''') is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set.  Data sets can also consist of a collection of documents or files.
In the Discovery Data Service datasets may be generated as a result of queries and valueset filters on the internal relational database.
|[[wikipedia:Data_set|Dataset]]
|-
|-
|Discovery Collaborative
|Discovery Collaborative
Line 73: Line 110:
|-
|-
|Discovery Health Information Model
|Discovery Health Information Model
|The "toolkit " of the Discovery Data Service.  1) A set of ontologies and classifications 2) A common data model 3) A metadata library of clusters of codes (codesets, valuesets etc) 4) A catalogue of reference data (such as geographical areas, organisations and people). 5) A library of queries 6) A set of “maps” (mappings) 7) An Information Manager - An open source set of utilities that can be used to browse, search, or  maintain the model.
|The "toolkit " of the Discovery Data Service.  1) A set of ontologies and classifications 2) A common data model 3) A metadata library of clusters of codes (codesets, valuesets etc) - currently the IM1 data tables in the DDS that are managed through IM2 (see below) 4) A catalogue of reference data (such as geographical areas, organisations and people). 5) A library of queries 6) A set of “maps” (mappings) 7) An Information Manager - currently (Endeavour Information Manager 2 - IM2) - an open source set of utilities that can be used to browse, search, or  maintain the model.
|[https://wiki.endeavourhealth.org/index.php?title=Discovery_health_information_model Discovery Health Information Model]
|[https://wiki.endeavourhealth.org/index.php?title=Discovery_health_information_model Discovery Health Information Model]
|-
|-
|Endeavour Information Model Manager
|Endeavour Information (Model) Manager
|The Information Model Manager is an application developed by the Endeavour Charitable Trust which provides a view of the Common Information Model and supports authoring of the content of the model. It can be used to view the model structure and content, download artefacts (e.g. value sets), and manage the ontology. It can be used to view entities and their relationships (e.g. an A&E admission, which is an Encounter and has a subject of a Patient and has a sub-encounter of a Triage Encounter etc)
|The Information Model Manager is an application developed by the Endeavour Charitable Trust which provides a view of the Common Information Model and supports authoring of the content of the model. It can be used to view the model structure and content, download artefacts (e.g. value sets), and manage the ontology. It can be used to view entities and their relationships (e.g. an A&E admission, which is an Encounter and has a subject of a Patient and has a sub-encounter of a Triage Encounter etc)
|[https://im.endeavourhealth.net/#/ Endeavour IM2]
|[https://im.endeavourhealth.net/#/ Endeavour IM2]
|-
|Environment (computing)
|An environment is the collection of computer machinery, data storage devices, work stations, software applications, and networks that support the processing and exchange of electronic information required by the software application. Typically there will be several environments for one system, often including a development environment (where developers can test out new code and see how it interacts), a staging, or test, environment (where new code is subject to an automated test suite comprising end-to-end tests, regression testing, unit and integration tests, as well as manual User Acceptance Testing), and a production environment (which is where the live application runs).
|
|-
|-
|Expression (Snomed CT)
|Expression (Snomed CT)
Line 87: Line 128:
|The ''Expression Constraint Language'' is a formal syntax for representing SNOMED CT expression constraints. Expression constraints are computable rules used to define a bounded sets of clinical meanings represented by either precoordinated or postcoordinated expressions. Expression constraints can be used to restrict the valid values for a data element in an EHR, as the intensional definition of a concept-based reference set, as a machine processable query that identifies a set of matching expressions, or as a constraint that restricts the range of an attribute defined in the SNOMED CT concept model.
|The ''Expression Constraint Language'' is a formal syntax for representing SNOMED CT expression constraints. Expression constraints are computable rules used to define a bounded sets of clinical meanings represented by either precoordinated or postcoordinated expressions. Expression constraints can be used to restrict the valid values for a data element in an EHR, as the intensional definition of a concept-based reference set, as a machine processable query that identifies a set of matching expressions, or as a constraint that restricts the range of an attribute defined in the SNOMED CT concept model.
|Expression Constraint [https://confluence.ihtsdotools.org/display/DOCECL Language]
|Expression Constraint [https://confluence.ihtsdotools.org/display/DOCECL Language]
|-
|Facts
|Patient "Facts" - A concept introduced by NEL CSU to include  '''1) Clinical components''' - effectively cohorts or registers - for instance long term conditions with national (e.g. QOF) or locally defined definitions e.g. Hypertension, Asthma '''2) Segmentation models''' e.g. "frailty" or likelihood of admission. '''3) Care quality standard compliance''' - quality measures which identify whether care standards are being met i.e. has a medication review or health check been done? Has the patient had each of 8 care processes for diabetes? Has the patient on a hypertension register had a blood pressure scheck and met the target?
Developing these at a London scale might be an excellent way to promote standardised Population Health Management.
|
|-
|-
|FHIR
|FHIR
Line 121: Line 167:


[https://blog.softwaresuperglue.com/2018/11/09/information-model-vs-data-model/ Information Model vs Data Model]
[https://blog.softwaresuperglue.com/2018/11/09/information-model-vs-data-model/ Information Model vs Data Model]
|-
|Infrastructure as a service (IaaS)
|'''[[wikipedia:Infrastructure_as_a_service|Infrastructure as a service (IaaS)]]''' is a cloud computing service model by means of which computing resources are supplied by a cloud services provider. The IaaS vendor provides the storage, network, servers and virtualization (which mostly refers, in this case, to emulating computer hardware).
|
|-
|Integrated Care Board (ICB)
|The ICB is a legal entity created in the Health and Care Act 2022.  ICBs will bring the NHS together locally to deliver shared priorities, with a greater emphasis on collaboration and shared responsibility for the health of the local population. This will require governance arrangements that support collective accountability between partner organisations for whole-system delivery and performance. These arrangements should be proportionate, and they must facilitate transparent decision-making and foster the culture and behaviours that enable system working.
|[https://www.england.nhs.uk/wp-content/uploads/2021/06/B1551--Guidance-to-Clinical-Commissioning-Groups-on-the-preparation-of-Integrated-Care-Board-constitutions.pdf Guidance to clinical commissioning groups on preparing integrated care board constitutions]
[https://www.legislation.gov.uk/ukpga/2022/31/contents/enacted Health and Care Act 2022]
|-
|Integrated Care Partnership (ICP)
|The ICP is a joint committee of the ICB and the upper tier local authorities that are wholly or partly in the ICB area. From 1 July, the ICB and the local authorities will be under a legal duty to establish the ICP.
It is the role of the ICP to develop and publish the integrated care strategy for the ICB area, in particular focusing on how health and care can better integrate.
|[https://www.england.nhs.uk/wp-content/uploads/2021/06/B1551--Guidance-to-Clinical-Commissioning-Groups-on-the-preparation-of-Integrated-Care-Board-constitutions.pdf Guidance to clinical commissioning groups on preparing integrated care board constitutions]
|-
|Integrated Care System (ICS)
|Integrated care systems (ICSs) are partnerships of health and care organisations that come together to plan and deliver joined up services and to improve the health of people who live and work in their area.
|[https://www.england.nhs.uk/wp-content/uploads/2021/06/B1551--Guidance-to-Clinical-Commissioning-Groups-on-the-preparation-of-Integrated-Care-Board-constitutions.pdf Guidance to clinical commissioning groups on preparing integrated care board constitutions]
|-
|-
|Joint Controller Agreement (JCA)
|Joint Controller Agreement (JCA)
Line 129: Line 193:
|An information architecture where concepts are represented as nodes and edges in a network of relationships. This is usually represented in a NoSQL architecture designed to around relationships between concepts.
|An information architecture where concepts are represented as nodes and edges in a network of relationships. This is usually represented in a NoSQL architecture designed to around relationships between concepts.
|
|
|-
|Level 2 - the London Health Data Service (LHDS)
|The name is derived from the Local Health and Care Exemplar Level 2 work. It now refers to the data service hosted by NE London ICB in their Microsoft Azure environment.  This data service is now called The London Health Data Service. In the first instance it will contain close to real time primary care accessed using IM1 feeds, in addition to commissioning data sets that have been processed by NHS Digital.  In time it is hoped that it will process real-time HL7 messages from provider organisations, unstructured data, imaging and "'omics". It is intended that it will be a part of the London sub-National Secure Data Environment for Research and Development (SN SDE for R+D).  In phase 1 of the SN SDE for R+D the LHDS will send data to the NWL Discover-Now architecture, which, in turn, will act as a Trusted Research Environment (TRE) for London.  The hope is that the LHDS would be available for multiple other purposes including individual (direct) care, population health management, public health and commissioning. LHDS is also an acronym signifiying the London Health Data Strategy.
|[https://digital.nhs.uk/binaries/content/assets/website-assets/services/future-gp-it-systems/im1factsheetfeb2021.docx IM1]
[https://digital.nhs.uk/services/data-services-for-commissioners/commissioning-datasets commissioning data sets]
|-
|-
|[[Logical Data Model]]
|[[Logical Data Model]]
Line 174: Line 244:
|-
|-
|OMOP Data Model
|OMOP Data Model
|OMOP stands for Observational Medical Outcomes Partnership, which was formed to inform the appropriate use of observational healthcare databases. OHDSI (Observational Health Data Sciences and Informatics) is a collaborative that now includes all of the original OMOP research investigators and will continue to develop tools using the OMOP common data model and vocabulary (OMOP is no longer and active programme).
|OMOP stands for Observational Medical Outcomes Partnership, which was formed to inform the appropriate use of observational healthcare databases. OHDSI (Observational Health Data Sciences and Informatics) is a collaborative that now includes all of the original OMOP research investigators and will continue to develop tools using the OMOP common data model and vocabulary (OMOP is no longer and active programme). OMOP is cited in the UK Health Data Research paper on building Building (federated )[https://zenodo.org/record/5767586/files/211208%20Building%20TREs%20Paper%20v1.0.pdf Trusted Research Environments] (p22).
|[https://www.ohdsi.org/data-standardization/the-common-data-model/ OMOP]
|[https://www.ohdsi.org/data-standardization/the-common-data-model/ OMOP]
[https://github.com/OHDSI/CommonDataModel/blob/v5.4.0/inst/ddl/5.4/sql_server/OMOPCDM_sql_server_5.4_ddl.sql OMOP Common Data Model (GitHub)]
[https://github.com/OHDSI/CommonDataModel/blob/v5.4.0/inst/ddl/5.4/sql_server/OMOPCDM_sql_server_5.4_ddl.sql OMOP Common Data Model (GitHub)]
Line 204: Line 274:
|The physical data model describes how a database should be structured and is a representation of table structures, columns, column names, column constraints, primary keys, foreign keys, and any other physical features of the database. A database is an implementation of a physical data model.
|The physical data model describes how a database should be structured and is a representation of table structures, columns, column names, column constraints, primary keys, foreign keys, and any other physical features of the database. A database is an implementation of a physical data model.
|[[wikipedia:Physical_schema|Physical Schema]]
|[[wikipedia:Physical_schema|Physical Schema]]
|-
|Platform as a service (PaaS)
|[[wikipedia:Platform_as_a_service|'''Platform as a service''' ('''PaaS''')]] or '''application platform as a service''' ('''aPaaS''') or platform-based service is a category of cloud computing services that allows customers to provision, instantiate, run, and manage a modular bundle comprising a computing platform and one or more applications
|
|-
|-
|Primary Care Reference Set
|Primary Care Reference Set
|This is a cluster of codes used within business rules authored and maintained by NHSD's primary care domain.
|This is a cluster of codes used within business rules authored and maintained by NHSD's primary care domain.
|[https://digital.nhs.uk/data-and-information/data-collections-and-data-sets/data-collections/quality-and-outcomes-framework-qof/quality-and-outcome-framework-qof-business-rules/primary-care-domain-reference-set-portal NHSD Primary Care Domain Reference Set Portal]
|[https://digital.nhs.uk/data-and-information/data-collections-and-data-sets/data-collections/quality-and-outcomes-framework-qof/quality-and-outcome-framework-qof-business-rules/primary-care-domain-reference-set-portal NHSD Primary Care Domain Reference Set Portal]
|-
|Processor
|''''processor'''<nowiki/>' means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller
|[https://www.privacy-regulation.eu/en/article-4-definitions-GDPR.htm Processor]
|-
|Public benefit
|Public benefit means that there should be some ‘net good’ accruing to the public; it has both
a benefit aspect and a public aspect. The benefit aspect requires the achievement of good,
not outweighed by any associated risk. Good is interpreted in a broad and flexible manner
and can be direct, indirect, immediate or long-term. Benefit needs to be identifiable, even if
it cannot be immediately quantified or measured. The public aspect requires demonstrable
benefit to accrue to the public, or a section of the public
|[https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1124013/NDG_public_benefit_guidance_v1.0_-_14.12.22.pdf Public Benefit]
|-
|-
|Relational Database
|Relational Database
Line 216: Line 307:
|A clinical idea to which a unique concept identifier has been assigned.
|A clinical idea to which a unique concept identifier has been assigned.
|[https://confluence.ihtsdotools.org/display/DOCGLOSS/concept SNOMED CT Concept]
|[https://confluence.ihtsdotools.org/display/DOCGLOSS/concept SNOMED CT Concept]
|-
|Software as a service (SaaS)
|'''[[wikipedia:Software_as_a_service|Software as a service]]''' ('''SaaS''' /sæs/) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software.
|
|-
|-
|SQL
|SQL
Line 223: Line 318:
|SQL Database
|SQL Database
|A SQL Database is a relational database, which is a collection of tables storing a specific set of structured data, with a fixed schema, which can be queried using SQL
|A SQL Database is a relational database, which is a collection of tables storing a specific set of structured data, with a fixed schema, which can be queried using SQL
|
|-
|Star Schema
|A simple database schema which consists of fact tables and dimension tables
|[[wikipedia:Star_schema|Star schema]]
|-
|Synthetic data
|'''Synthetic data''' is information that's artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models
|[[wikipedia:Synthetic_data|Synthetic data]]
|-
|Subscription (Azure)
|An Azure subscription describes all resources in MS Azure, that are grouped and paid for in one bill by one bill payer.  The resources may include Infrastructure as a Service (IaaS), Platforms as a Service (PaaS) and Software as a Service (SaaS).  Several subscriptions can be owned by one tenant.
|[https://www.parallels.com/blogs/ras/azure-subscription/#:~:text=An%20Azure%20subscription%20is%20a,are%20used%20and%20billed%20together. Azure Subscription]<br />
|-
|Tenancy/Tenant (Azure)
|An Azure tenant represents an organisation. It is a dedicated instance of Azure Active Directory (Azure AD - which is an enterprise identity service which holds all the user accounts, and provides role-based access, single sign-on, multifactor authentication etc ensuring only the right people have access to the right resources, only when they need it). Each Azure tenant is distinct and separate from other Azure AD tenants. It should not be confused with an Azure Subscription
|[https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-whatis Azure Active Directory]<br />
|-
|Trusted Data Environment
|Raw [health and care] data is not powerful on its own. It must be shaped, checked, and curated into shape. It must be housed, and managed securely. It must be analysed. And then it must be communicated, and acted upon. That work all requires people, with modern data skills, in teams, using platforms that protect patients’ privacy and avoid needless duplication of effort."
|[https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1067053/goldacre-review-using-health-data-for-research-and-analysis.pdf Better, Broader, Safer: Using Health Data for Research and Analysis]
|-
|Trusted Research Environment
|An approach to data access based primarily around Trusted (Trustworthy) Research Environments (TREs), a type of Secure Data Environment; with
appropriate robust and independent accreditation, monitoring and auditing.
|[https://zenodo.org/record/5767586/files/211208%20Building%20TREs%20Paper%20v1.0.pdf Building Trusted Research Environments.  Principles and Best Practices; towards TRE ecosystems.]
|-
|Value Set
|A generic term for a set of values, or codes, which might be stored in a metadata library or other set of lookup data. Value sets are used as the filters of data to create Data Sets - for example, in the case of populating the London Care Record from DDS, Value Sets are used to filter the underlying data in order to create a Data Set which is suitable for ingestion.
Concepts and Clusters are types of Value Set.
Code Set is sometimes used interchangeably with Value Set
|
|
|}
|}

Latest revision as of 08:00, 16 February 2023

This is an area of the Wiki for members of the Discovery Collaborative to share definitions of common terms. It is public and can be referenced from other documents and websites as necessary. Please let us know if you would like to challenge or add definitions. simon.meredith@nhs.net or jack.barker@nhs.net

Term Definition External Links
Canonical Data Model This is a type of data model that presents data entities and relationships in the simplest possible form. It is a design pattern which allows communication between different data formats. Canonical form describes the simplest representation of an object. Canonical model - Wikipedia
Categories Broad sections of an Ontology or Classification such as the Categories in ICD 10 which contain all ICD 10 codes related to one body part. It might be also be large groupings of test results e.g. Haematology, Biochemistry, Immunology, Microbiology etc. ICD 10
Cloud The Cloud refers to IT infrastructure, platforms, and services that are hosted in a remote data centre, managed by a third party. Typically these data centres are run and managed by large companies such as Microsoft (Azure), Amazon (AWS), and Google (GWP), but there are many other providers too. Using the cloud removes the need for customers to buy and maintain their own physical IT equipment. Instead, the cloud provider typically charges customers daily rates based on storage used, computational power used, memory, number of servers etc Cloud Computing
Cloud Native Cloud Native refers to building software and systems in the cloud that make use of services made available by cloud providers, to run scalable, resilient applications. For example, instead of creating a SQL database on a server in the cloud, which the customer would need to monitor, maintain, patch, back-up etc, they could make use of a “SQL database as a service” from the cloud provider, who would automatically take care of all the maintenance without any further intervention from the customer Cloud Native Computing
Cluster A group of concept codes making another idea. The Cluster may be made up of concepts from different ontologies or classifications e.g., Snomed CT and ICD10. Synonyms for Cluster are refset, value set, codelist, code set, grouper. The name "cluster" was strongly advocated for by John Robson (Clinical Effectiveness Group (CEG) NEL) because of his work in primary care. Primary Care Domain Reference Set Portal
Common Data Model This is the shared Data Model which has been adopted as a standard for an organisation or group. It defines how the elements of data relate to one another and enables data to be transferred between different systems which share the Common Data Model with ease.

It defines the transformation of data from multiple different databases with potentially multiple coding systems into a common storage format (data model) as well as a common representation (terminologies, vocabularies, coding schemes)

Microsoft Example
Common Data Model (Discovery) The Discovery Common Data Model is the schema of the Discovery Compass v2 databases which exist in the Discovery Core and are held as subscriber databases within ICS environments. The schema is here..... The schema is thought to be based on EMIS primary care system schema.
Compass Database (Compass v2) The database schema used in "subscriber" databases in DDS. This is often referred to as the DDS Common Data Model. There are currently 3 versions of the Compass database within the Discovery AWS core, one for each ICS - NEL, SEL and NWL. These are still referred to as "Subscriber" databases by original DDS management team. There are four "Compass" databases outside the AWS core. One with NEL CEG, one with NEL ICS, one with SEL ICS and one in NWL WSIC infrastructure. Compass 2 Schema Mappings

Compass Database Mappings

Concept A coded idea. For example, Name = Baker's Asthma, SNOMED CT code = 34015007
Conceptual Data Model This is a high level model, describing business processes and how they relate to one another. In simple terms, it is a map of concepts and rules relating to the business which can then be used to define relationships between data entities in order to describe the business concepts – this leads to the creation of a logical data model
Customer Managed Keys Customer Managed Keys are security keys that are created, owned, and managed by the customer. The customer has full control over enabling, disabling, key policies, rotating cryptographic material, scheduling for deletion. In simple terms, this means a supplier may hold the customer's data in their system (software as a service, for example) which is encrypted by the customer managed key. The supplier is unable to decrypt the data. Only the customer can do so as they hold the key. Customer Managed Keys - AWS

Key Management in Azure

Data Controller or simply Controller Data controller means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law (Definition from Article 4 (7) GDPR) Controller
Data Dictionary A centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format e.g. NHS Data Dictionary. Data Dictionary
Data Lake A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). A data lake can be established "on premises" (within an organization's data centers) or "in the cloud" (using cloud services from vendors such as Amazon, Microsoft, or Google).

Poorly-managed data lakes have been facetiously called data swamps.

Data Lake
Data Mart A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area. Data marts make specific data available to a defined group of users, which allows those users to quickly access critical insights without wasting time searching through an entire data warehouse Data Mart
Data Model Data models exist at different levels of abstraction, from conceptual data models which are high-level and consider business activity and relationships, to a physical data model which represents the implementation of a database. A data model shows how elements of data relate to one another and how they are structured, how they relate to business needs, and how they can be manipulated. Data Model

NHS Data Model and Dictionary Service

Data Platform A Data Platform is any digital solution which enables a single organisation to store, transform, aggregate, and analyse data. It integrates many different technologies for different purposes to meet the data needs of the organisation. It includes security and access controls to ensure IG compliance
Database Index Indexes are used to quickly sort and retrieve data in a database. They may take the form of a physical, clustered index, which is the physical ordering of the data in the table (e.g. by ID or surname), or a non-clustered index which describes the logical ordering of the data separately – like the index at the back of a book: the index key is ordered (e.g. surname) and next to it is information showing where to find the record.
Database Schema This is the definition of the structure of a relational database and defines how data is organised, including the table names, fields (also known as columns), data types, constraints, indexes, and relationships between these entities. The schema is a blueprint of how the database is constructed.
Data Sharing Agreement (DSA) Framework The arrangements for storing and managing data sharing agreements to ensure transparency across controllers and processors. It is likely that the Data Usage Committees/Data Access Groups (Joint Controllers) will generate and store a copy of the DSA, whilst processors and sub-processors will also need a copy.
Data Warehouse A data warehouse stores data for the main purpose of reporting and data analysis. They are central repositories which bring together disparate data sets into one place, to enable reporting at scale and pace. Data Warehouse
Dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set. Data sets can also consist of a collection of documents or files.

In the Discovery Data Service datasets may be generated as a result of queries and valueset filters on the internal relational database.

Dataset
Discovery Collaborative A collaboration of three integrated care systems (ICS) with the ambition to develop and maintain a data service. It arose out of NE London where GPs and academics were working with a charitable trust to develop a secure, trusted, aggregated information resource to support quality improvement and research. As part of the national Local Health and Care Record Exemplar programme it expanded to NW and SE London ICSs. The Collaborative has a Board and working groups. The Chair of the Board reports to the Senior Responsible Officer for digital in each of the three ICSs.
Discovery Data Service (DDS) The hardware and software made available by the Discovery Collaborative. The DDS is not an organisation. The management and development of the bulk of the DDS is currently sub-contracted to a private company Voror in an Amazon Web Service Environment. However, there have been discussions about in-housing the service under closer control by NEL CCG and moving the service to a Microsoft Azure environment.
Discovery Health Information Model The "toolkit " of the Discovery Data Service. 1) A set of ontologies and classifications 2) A common data model 3) A metadata library of clusters of codes (codesets, valuesets etc) - currently the IM1 data tables in the DDS that are managed through IM2 (see below) 4) A catalogue of reference data (such as geographical areas, organisations and people). 5) A library of queries 6) A set of “maps” (mappings) 7) An Information Manager - currently (Endeavour Information Manager 2 - IM2) - an open source set of utilities that can be used to browse, search, or maintain the model. Discovery Health Information Model
Endeavour Information (Model) Manager The Information Model Manager is an application developed by the Endeavour Charitable Trust which provides a view of the Common Information Model and supports authoring of the content of the model. It can be used to view the model structure and content, download artefacts (e.g. value sets), and manage the ontology. It can be used to view entities and their relationships (e.g. an A&E admission, which is an Encounter and has a subject of a Patient and has a sub-encounter of a Triage Encounter etc) Endeavour IM2
Environment (computing) An environment is the collection of computer machinery, data storage devices, work stations, software applications, and networks that support the processing and exchange of electronic information required by the software application. Typically there will be several environments for one system, often including a development environment (where developers can test out new code and see how it interacts), a staging, or test, environment (where new code is subject to an automated test suite comprising end-to-end tests, regression testing, unit and integration tests, as well as manual User Acceptance Testing), and a production environment (which is where the live application runs).
Expression (Snomed CT) A combination of concepts that represent something that is not already represented as a Concept or Cluster. It typically links things across concepts (disease of infection (finding), with bacteria name (organism), with sepsis (observable entity) with kidney (body structure). Snomed CT Expression
Expression Constraint Language The Expression Constraint Language is a formal syntax for representing SNOMED CT expression constraints. Expression constraints are computable rules used to define a bounded sets of clinical meanings represented by either precoordinated or postcoordinated expressions. Expression constraints can be used to restrict the valid values for a data element in an EHR, as the intensional definition of a concept-based reference set, as a machine processable query that identifies a set of matching expressions, or as a constraint that restricts the range of an attribute defined in the SNOMED CT concept model. Expression Constraint Language
Facts Patient "Facts" - A concept introduced by NEL CSU to include 1) Clinical components - effectively cohorts or registers - for instance long term conditions with national (e.g. QOF) or locally defined definitions e.g. Hypertension, Asthma 2) Segmentation models e.g. "frailty" or likelihood of admission. 3) Care quality standard compliance - quality measures which identify whether care standards are being met i.e. has a medication review or health check been done? Has the patient had each of 8 care processes for diabetes? Has the patient on a hypertension register had a blood pressure scheck and met the target?

Developing these at a London scale might be an excellent way to promote standardised Population Health Management.

FHIR FHIR stands for Fast Healthcare Interoperability Resources. It is an international standard for health care data exchange and is published by HL7. It is designed to enable the exchange of healthcare related information, including clinical, administrative, publich-health, and research data. DDS stores data in the core in FHIR format before it is restructured and sent to subscriber databases
Functional Requirements These are product features or functionality that must be developed in order for the system to satisfy requirements. They must be implemented in the system in order for it to achieve what it is supposed to accomplish. Examples include "users should be able to download data based on their filtered search criteria" or "only admin-level users should be able to reset passwords for other users" Functional requirement
GitHub GitHub is a provider of version control and source-code management using Git (which is software used for tracking changes in files). It enables software developers to safely store source code and work on it collaboratively, merging changes from different developers into the main "production" branch safely. It provides additional features such as bug/issue tracking, continuous integration, feature requests etc. GitHub was acquired by Microsoft in 2018. GitHub
Graph Database Graph Databases are used to represent data as elements and their relationships (known as nodes and

edges). They are particularly beneficial when representing large datasets with complex and numerous relationships, such as social media contacts and relations. Unlike a relational database, where to return data it is often necessary to join many tables, in a graph database this isn’t necessary. Relationships are stored natively alongside the data elements enabling much faster querying

HDRUK Phenotype Library This is a comprehensive, open access resource which exists to provide the research community with information, tools, and phenotyping algorithms for UK electronic health records (EHRs) HDR UK Phenotype Library
Information Model A model describing ontologically standardised data items following a specific database schema within a specific relational database architecture.

It is an abstract model, showing the constraints, relationships, concepts, and data items for a domain.

The Discovery Information Model is a representation of the meaning and structure of data held in the electronic records of the health and social care sector. It includes libraries of queries, value sets, concept sets, data set definitions and mappings. The main purpose is to bridge the chasm that exists between highly technical digital representations and plain language so that when questions are asked of data, a lay person could use plain language without prior knowledge of the underlying models.

Information Model

DDS IM

IM Viewer

Information Model vs Data Model

Infrastructure as a service (IaaS) Infrastructure as a service (IaaS) is a cloud computing service model by means of which computing resources are supplied by a cloud services provider. The IaaS vendor provides the storage, network, servers and virtualization (which mostly refers, in this case, to emulating computer hardware).
Integrated Care Board (ICB) The ICB is a legal entity created in the Health and Care Act 2022. ICBs will bring the NHS together locally to deliver shared priorities, with a greater emphasis on collaboration and shared responsibility for the health of the local population. This will require governance arrangements that support collective accountability between partner organisations for whole-system delivery and performance. These arrangements should be proportionate, and they must facilitate transparent decision-making and foster the culture and behaviours that enable system working. Guidance to clinical commissioning groups on preparing integrated care board constitutions

Health and Care Act 2022

Integrated Care Partnership (ICP) The ICP is a joint committee of the ICB and the upper tier local authorities that are wholly or partly in the ICB area. From 1 July, the ICB and the local authorities will be under a legal duty to establish the ICP.

It is the role of the ICP to develop and publish the integrated care strategy for the ICB area, in particular focusing on how health and care can better integrate.

Guidance to clinical commissioning groups on preparing integrated care board constitutions
Integrated Care System (ICS) Integrated care systems (ICSs) are partnerships of health and care organisations that come together to plan and deliver joined up services and to improve the health of people who live and work in their area. Guidance to clinical commissioning groups on preparing integrated care board constitutions
Joint Controller Agreement (JCA) The arrangements by which a group of data controllers agree to work together to agree the uses of data and how they jointly take responsibility for that.  This is frequently manifested as the development of a representative committee that reviews the five safes, the legal basis for use of personal data, the worthiness of the proposed use of data Joint Controllers
Knowledge Graph An information architecture where concepts are represented as nodes and edges in a network of relationships. This is usually represented in a NoSQL architecture designed to around relationships between concepts.
Level 2 - the London Health Data Service (LHDS) The name is derived from the Local Health and Care Exemplar Level 2 work. It now refers to the data service hosted by NE London ICB in their Microsoft Azure environment. This data service is now called The London Health Data Service. In the first instance it will contain close to real time primary care accessed using IM1 feeds, in addition to commissioning data sets that have been processed by NHS Digital. In time it is hoped that it will process real-time HL7 messages from provider organisations, unstructured data, imaging and "'omics". It is intended that it will be a part of the London sub-National Secure Data Environment for Research and Development (SN SDE for R+D). In phase 1 of the SN SDE for R+D the LHDS will send data to the NWL Discover-Now architecture, which, in turn, will act as a Trusted Research Environment (TRE) for London. The hope is that the LHDS would be available for multiple other purposes including individual (direct) care, population health management, public health and commissioning. LHDS is also an acronym signifiying the London Health Data Strategy. IM1

commissioning data sets

Logical Data Model The logical data model (or logical schema) describes how data objects relate to each other, but is independent of the technology or database management system used. Logical data models typically show entities (e.g. a patient, an appointment, a result), relationships (e.g. an appointment is linked to a patient via a unique identifier), and attributes (information that is useful to further describe the entities) Logical Schema
London Care Record A shared care record used for individual care. It is based on Cerner Health Information Exchange architecture.
Mauro Data Mapper Mauro Data Mapper is a third-party toolkit for the design and documentation of databases, data flows, and data standards, as well as related software artefacts such as data schemas and data forms. It was originally developed for the description of data in clinical research. Mauro Data Mapper
Medical Classification Medical Classifications transform descriptions of procedures or diagnoses into standardised codes through the process of clinical coding. E.g., ICD10, OPCS4, LOINC. These

can also be called code sets. Hence, we call them a medical classification to avoid confusion with Clusters

Medical Classification
Metadata Library A tool that allows management of clusters
NHS Terminology Server The NHS Terminology Server “is a FHIR compliant solution that holds and disseminates assured international terminologies and classifications (such as SNOMED-CT and ICD-10) and national terminologies (such as NHS Data Model and Dictionary codes).”

See https://digital.nhs.uk/services/terminology-servers. Content is delivered in machine readable format and can be accessed as a real-time resource through APIs to support other applications.

NHSD Terminology Server
Non-functional Requirements These are requirements concerning the operation of a system rather than functionality, and typically relate to security, scalability, performance, reliability etc. Examples include "the page must load within 500ms" or "the website must conform to WCAG AA accessibility standards" Non-functional Requirements
Normalisation (health data) In healthcare, this is the process of taking data from different sources, in different formats and with different code sets, and converting the data into a singular, unified clinical language or terminology – for example mapping proprietary codes from EPR systems to SNOMED codes Healthcare Normalisation

What is data normalisation?

Normalisation (database) This is the process of structuring a relational database in accordance with a series of derived rules, called normal forms, in order to reduce redundancy, duplication, and improve data integrity. Normalisation
NoSQL Database NoSQL stands for Not Only SQL, and is a generic name given to databases which store data in a non-tabular format. They are not relational databases – they may use storage techniques such as key-value pairs, document stores, graph databases. In some cases, SQL can still be used to query them
OMOP Data Model OMOP stands for Observational Medical Outcomes Partnership, which was formed to inform the appropriate use of observational healthcare databases. OHDSI (Observational Health Data Sciences and Informatics) is a collaborative that now includes all of the original OMOP research investigators and will continue to develop tools using the OMOP common data model and vocabulary (OMOP is no longer and active programme). OMOP is cited in the UK Health Data Research paper on building Building (federated )Trusted Research Environments (p22). OMOP

OMOP Common Data Model (GitHub)

On-Premise (On-Prem) IT infrastructure build and maintained in-house – for example, database servers sitting in the basement of a hospital building which the IT department maintain and upgrade themselves
Ontology An Ontology is a set of concepts and categories showing the relationships and properties between them in a particular domain. In healthcare, and ontology is used for modelling the semantics of medical concepts and to enable the exchange of medical data between systems. The most currently used ontology is SNOMED CT - the DDS ontology is made up of several ontologies (such as READ, CTV3 etc) mapped to SNOMED where appropriate SNOMED terms already exist, and extended where they don't.

A representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse (e.g., medicine). Example: Snomed CT.

Ontology

NHSD SNOMED browser

SNOMED.org

SNOMED - NHS England

OpenCodelists OpenCodelists was created by OpenSAFELY for creating and sharing codelists. See "Cluster" definition above. OpenCodelists

OpenSAFELY

PESTLE Analysis A PESTLE analysis studies the key external factors (Political, Economic, Sociological, Technological, Legal and Environmental) that influence an organisation. It can be used in a range of different scenarios, and can guide people professionals and senior managers in strategic decision-making Pestle Analysis
Physical Data Model The physical data model describes how a database should be structured and is a representation of table structures, columns, column names, column constraints, primary keys, foreign keys, and any other physical features of the database. A database is an implementation of a physical data model. Physical Schema
Platform as a service (PaaS) Platform as a service (PaaS) or application platform as a service (aPaaS) or platform-based service is a category of cloud computing services that allows customers to provision, instantiate, run, and manage a modular bundle comprising a computing platform and one or more applications
Primary Care Reference Set This is a cluster of codes used within business rules authored and maintained by NHSD's primary care domain. NHSD Primary Care Domain Reference Set Portal
Processor 'processor' means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller Processor
Public benefit Public benefit means that there should be some ‘net good’ accruing to the public; it has both

a benefit aspect and a public aspect. The benefit aspect requires the achievement of good,

not outweighed by any associated risk. Good is interpreted in a broad and flexible manner

and can be direct, indirect, immediate or long-term. Benefit needs to be identifiable, even if

it cannot be immediately quantified or measured. The public aspect requires demonstrable

benefit to accrue to the public, or a section of the public

Public Benefit
Relational Database A Relational database is one where the data is stored in tables, featuring rows and columns and has predefined relationships between the data items. Typically, a table will have a primary key, which is a unique identifier for items in that table. Other tables will reference that primary key for items that are related to it by means of a foreign key. For example, in a table of patients, the primary key might be the NHS number
SNOMED CT Concept A clinical idea to which a unique concept identifier has been assigned. SNOMED CT Concept
Software as a service (SaaS) Software as a service (SaaS /sæs/) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software.
SQL A Relational database is one where the data is stored in tables, featuring rows and columns and has predefined relationships between the data items. Typically, a table will have a primary key, which is a unique identifier for items in that table. Other tables will reference that primary key for items that are related to it by means of a foreign key. For example, in a table of patients, the primary key might be the NHS number Structured Query Language
SQL Database A SQL Database is a relational database, which is a collection of tables storing a specific set of structured data, with a fixed schema, which can be queried using SQL
Star Schema A simple database schema which consists of fact tables and dimension tables Star schema
Synthetic data Synthetic data is information that's artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models Synthetic data
Subscription (Azure) An Azure subscription describes all resources in MS Azure, that are grouped and paid for in one bill by one bill payer. The resources may include Infrastructure as a Service (IaaS), Platforms as a Service (PaaS) and Software as a Service (SaaS). Several subscriptions can be owned by one tenant. Azure Subscription
Tenancy/Tenant (Azure) An Azure tenant represents an organisation. It is a dedicated instance of Azure Active Directory (Azure AD - which is an enterprise identity service which holds all the user accounts, and provides role-based access, single sign-on, multifactor authentication etc ensuring only the right people have access to the right resources, only when they need it). Each Azure tenant is distinct and separate from other Azure AD tenants. It should not be confused with an Azure Subscription Azure Active Directory
Trusted Data Environment Raw [health and care] data is not powerful on its own. It must be shaped, checked, and curated into shape. It must be housed, and managed securely. It must be analysed. And then it must be communicated, and acted upon. That work all requires people, with modern data skills, in teams, using platforms that protect patients’ privacy and avoid needless duplication of effort." Better, Broader, Safer: Using Health Data for Research and Analysis
Trusted Research Environment An approach to data access based primarily around Trusted (Trustworthy) Research Environments (TREs), a type of Secure Data Environment; with

appropriate robust and independent accreditation, monitoring and auditing.

Building Trusted Research Environments.  Principles and Best Practices; towards TRE ecosystems.
Value Set A generic term for a set of values, or codes, which might be stored in a metadata library or other set of lookup data. Value sets are used as the filters of data to create Data Sets - for example, in the case of populating the London Care Record from DDS, Value Sets are used to filter the underlying data in order to create a Data Set which is suitable for ingestion.

Concepts and Clusters are types of Value Set.

Code Set is sometimes used interchangeably with Value Set