Sets and classes: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
m (DavidStables moved page Value sets to Sets and classes)
No edit summary
Line 1: Line 1:
All sets are classes. Not all classes are sets.


[[File:Information model value sets.png|thumb|Value set package]]
Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.


A value set definition, and it's run time counterpart- value set transitive closure  , is a set of [[Concepts_classes_and_properties|class expressions]] collected together for a particular business purpose.
Classes are collections of things with common properties. Members of a class all share some of their properties. A member of a class  which is specialised by dint of having additional properties to the class is considered a subclass.


There are a range of purposes for a value set. Examples range from defining a data set according to a set of recorded concepts, indicating the expected range of a property in a health record, or testing the presence of a feature in a patient record. 
Sets are the single most significant structure used in health query. A whole set of terms are in common use to describe different sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.


Value sets are one of the main 5 packages within the common information model.
Snomed-CT uses the term 'reference set' which is semantically the same as a concept set.  


'''''It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity with two types.'''''


== Value set class and language ==
(This creates an interesting twist as a search for the subclasses of '''''Ethnic category - 2001 Census''''' would produce different results from a search for all members of subsets of  '''''Ethnic category - 2001 Census)''''' In other words, how the concept is treated is determined by the use case.


[[File:Value set class.png|Value set class|alt=|thumb]]
Set definitions can be simple or complex and fall into 2 main patterns:
A value set class is best conceptualised in UML terms as a package whose elements are class expressions.


All value sets and the class expressions use concepts from the semantic ontology and the following shows the relationship between the two.
== Sets whose members are expressions ==
This pattern forms the majority of sets.


The main difference between a value set expression and its syntactically equivalent class definition in the ontology is the way the value set is used. Within an ontology the class expression is normally used as part of an axiom for use in reasoning. Within a query a value set is used to test the value of a concept in a record. Within a data model, a value set is used as a range of allowable or preferred values for a particular property where the values do not belong to one class.
The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:


# has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
# not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection.
<syntaxhighlight lang="turtle">
im:VSET_RecordType_FamilyHistory            // This is a value set for family history
rdf:type [ im:ConceptSet ];
rdfs:label "Family history";
rdfs:comment "Family history value set not including negative family history";
im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|
</syntaxhighlight>


== Sets whose members are subsets ==
This pattern is used for sets that contain categories, each category being another set.


Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members


A value set class definition&nbsp;looks very much like a semantic ontology class expression,&nbsp;&nbsp;in that it uses similar grammar and syntax. In fact a value set can also be defined in OWL2 language.
The set is given an identifier and has on predicate:
 
In this example a non hospital face to face consultation value set is defined as
 
<span style="color:blue">"is a type of GP consultation value set,  and has members which are (on premise encounters, excluding hospital outpatients)</span>
 
<div class="toccolours mw-collapsible mw-collapsed">
Discovery syntax
 
<div class="mw-collapsible-content">
<syntaxhighlight lang="JSON">
Firstly the value set is semantically defined in the semantic ontology
{ "Class":{
  "iri": ":VSET_NonHospFaceToFace",
  "name": ":face to face consultations excluding hospital outpatients",
  "SubClassOf": {"Class":":VSET_GPConsultations"}}                            // The value set is a subclass of a set of value sets about GP consultations


Then the value set is defined in the value set class
# has Subset. Which is a collection of subsets which may be either member based sets or subset based sets
{"ValueSet":{
<syntaxhighlight lang="turtle">
  "iri": "VSET_NonHospFaceToFace",
sn:92381000000106        |Ethnic category - 2001 census (finding)|
   "Member":[{                                             
   rdf:type [owl:Class im:ConceptSet];
    "Intersection": [                                          // AND                                     
  rdfs:label "Ethnic category - 2001 census (finding)|"
        {"Class": ":CM_OnPremiseEncounter"},
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
        {"Negation": { "Class": ":DM_HospitalOpdEntry"}}]}}]}]}}      // BUT NOT...
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];
</syntaxhighlight>
</syntaxhighlight>


</div></div>
== Set definitions versus set expansions ==
Sets are usually defined with only a few members. Each member is a class definition.


When used in query it is usually necessary to use the "expanded" set which is in effect all of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate.


Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.<syntaxhighlight lang="sparql">
select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory 
}


<div class="toccolours mw-collapsible mw-collapsed">
---Result ---
OWL2 functional syntax
?concept
<div class="mw-collapsible-content">
sn:64571000119104 |Family history of abdominal wall defect (situation)
<pre>
sn:430560006 |Family history of chronic renal impairment (situation)|
SubClassOf(:VSET_EncFaceToFaceOnPrem
sn:959511000000100 |Family history of end stage renal disease (situation)|
  ObjectIntersectionOf(:VSET_GPConsultations
</syntaxhighlight><br />
          ObjectSomeValuesFrom(:CM_hasMember
              ObjectIntersectionOf(:CM_OnPremiseEncounter :ObjectComplementOf(:DM_HospOpdEntry))))


</pre> </div></div>
== Sets and codes ==
The health service still relies on codes to identify concepts (classes)  and concepts as sets.


As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs


In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.


<div class="toccolours mw-collapsible mw-collapsed">
The query says "get the concept and NHS Data Dictionary category code  of the members of the ethnicity 2001 set".
Expression constraint language
<div class="mw-collapsible-content">
<pre>
^ :VSET_EncFactToFaceOnPrem            // this is the query instruction to find concepts defined by the value set


<< :CM_OnPremiseEncounter MINUS :DH_HospOpdEntry     //This is the definition of the value set
Logically this SPQRQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.<syntaxhighlight lang="sparql">
select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
    filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
     ?category ^im:code ?termCode
}
</syntaxhighlight>


</pre> </div></div>


<br />
An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion could be<syntaxhighlight lang="sql">
select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset'
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'


== Value set definition vs ontology axiom ==


As discussed above, the main difference between an ontological axiom and a value set definition is their purpose and the way they are used.
</syntaxhighlight>
 
Members of a value set set&nbsp;may or may not shared similar properties and therefore may be semantically diverse.&nbsp; Nevertheless each member of a value set&nbsp;is a&nbsp;class expression with semantic meaning. Each member is used to define a set of concepts so that a [[subsumption test]] can be applied to the care record concept instances.
 
To put this another way, a value set has members, and each member of the value set is a set of members subsumed by that member!
 
From the above, it can be seen that value set definition expressions may use complex class expressions. In some cases this expression may already be part of a class definition. For example, a pre-coordinated named class "Home visit" would be a subclass of a Consultation with the care setting of home. Whether expressions in value sets are complex and simple depend entirely on convenience. However, value set editorial policy would encourage value set authors to use pre-defined classes as the ontology is the best place for these to be maintained.
 
The other main difference is in the logic applied at the point of query, or more accurately the difference between reasoning and query. When using the value set at run time to undertake subsumption testing of other concepts,&nbsp;&nbsp;a value set query uses a [[wikipedia:Closed-world_assumption|closed world assumption]] &nbsp;when handling negation, both in respect of the ontology sub-classes and the care record instances.  In a pure OWL2 ontology reasoner,  which uses the [[wikipedia:Open-world_assumption|open world assumption]], an expression that uses exclusion will often return no subsumed concept. This is because in the open world, it may be undecidable whether a particular concepts should be excluded or not, unless the ontology author had specifically made sure that the concept was disjoint or negated by some other logic.
 
Value set expressions are often referred to as expression constraints. In [https://confluence.ihtsdotools.org/display/DOCECL/Expression+Constraint+Language+-+Specification+and+Guide Snomed-CT expression constraint language] is essentially the same grammar as the Discovery value set class expression as shown above with ECL being a more succinct representation, but Discovery having broader logic.&nbsp;
 
== Value set transitive closure ==
 
For those operating large scale queries using relational databases, it could be normal practice to use a value set [https://en.wikipedia.org/wiki/Transitive_closure transitive closure] table, or list.
 
The [[Value_set_generator_API|value set generator AP]]I supports the means by which a value set definition can return a set of all known concepts in the ontology that are&nbsp; subsumed by the definition, applying the transitive properties to generate the closure list.&nbsp;


&nbsp;


&nbsp;


&nbsp;
&nbsp;


&nbsp;
&nbsp;

Revision as of 10:28, 7 July 2021

All sets are classes. Not all classes are sets.

Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.

Classes are collections of things with common properties. Members of a class all share some of their properties. A member of a class which is specialised by dint of having additional properties to the class is considered a subclass.

Sets are the single most significant structure used in health query. A whole set of terms are in common use to describe different sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.

Snomed-CT uses the term 'reference set' which is semantically the same as a concept set.

It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity with two types.

(This creates an interesting twist as a search for the subclasses of Ethnic category - 2001 Census would produce different results from a search for all members of subsets of Ethnic category - 2001 Census) In other words, how the concept is treated is determined by the use case.

Set definitions can be simple or complex and fall into 2 main patterns:

Sets whose members are expressions

This pattern forms the majority of sets.

The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:

  1. has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
  2. not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection.
im:VSET_RecordType_FamilyHistory             // This is a value set for family history
	rdf:type [ im:ConceptSet ];
	rdfs:label "Family history";
	rdfs:comment "Family history value set not including negative family history";
	im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
	im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|

Sets whose members are subsets

This pattern is used for sets that contain categories, each category being another set.

Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members

The set is given an identifier and has on predicate:

  1. has Subset. Which is a collection of subsets which may be either member based sets or subset based sets
sn:92381000000106         |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];

Set definitions versus set expansions

Sets are usually defined with only a few members. Each member is a class definition.

When used in query it is usually necessary to use the "expanded" set which is in effect all of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate.

Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.

select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory   
}

---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|


Sets and codes

The health service still relies on codes to identify concepts (classes) and concepts as sets.

As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs

In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.

The query says "get the concept and NHS Data Dictionary category code of the members of the ethnicity 2001 set".

Logically this SPQRQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.

select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
     filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}


An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion could be

select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset' 
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'