Sets and classes: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
All sets are classes. Not all classes are sets.


A value set definition, and it's run time counterpart- value set transitive closure  , is a set of [[Concepts_classes_and_properties|class expressions]] collected together for a particular business purpose.
Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.  


There are a range of purposes for a value set. Examples range from defining a data set according to a set of recorded concepts, indicating the expected range of a property in a health record, or testing the presence of a feature in a patient record. 
Classes are collections of things with common properties. Members of a class all share the properties of the class definition. A member of a class  which is specialised by dint of having additional properties to the class is considered a subclass.


Value sets are one of the main 4 packages within the common information model:[[File:IM Package Value set.jpg|center|400x250px|IM Package Value set.jpg]]
Sets are the single most significant structure used in health query. Many terms are in common use to describe different purposes of sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.


<br />
Snomed-CT uses the term 'reference set' which is semantically the same as a concept set. Some use the term "code sets" when interested in the codes of the concepts in a set.


== Value set class and language ==
'''''It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity can have two types.'''''


A value set class is best conceptualised in UML terms as a package whose elements are class expressions.  
(This creates an interesting twist as a search for the subclasses of '''''Ethnic category - 2001 Census''''' would produce different results from a search for all members of subsets of  '''''Ethnic category - 2001 Census)''''' In other words, how the concept is treated is determined by the use case and the query.


All value sets and the class expressions use concepts from the semantic ontology and the following shows the relationship between the two.[[File:Value set class.png|center|800x600px|Value set class.png]]
Set definitions can be simple or complex and fall into 2 main patterns:


A value set class definition&nbsp;looks very much like a semantic ontology class expression,&nbsp;&nbsp;in that it uses similar grammar and syntax. In fact a value set can also be defined in OWL2 language.
== Sets whose members are expressions ==
This pattern forms the majority of set definitions.


In this example a non hospital face to face consultation value set is defined as
The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:


<span style="color:blue">"is a type of GP consultation value set,  and has members which are (on premise encounters, excluding hospital outpatients)</span>
# has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
# not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection


<div class="toccolours mw-collapsible mw-collapsed">
For example, the following set definition for a positive family history has one member and one excluded member:<syntaxhighlight lang="turtle">
Discovery syntax
im:VSET_RecordType_FamilyHistory            // This is a value set for family history
 
rdf:type [ im:ConceptSet ];
<div class="mw-collapsible-content">
rdfs:label "Family history";
<syntaxhighlight lang="JSON">
rdfs:comment "Family history value set not including negative family history";
{ "iri": ":VSET_NonHospFaceToFace",
im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
  "name": ":face to face consultations excluding hospital outpatients",
im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|
  "SubClassOf": [{                                                  // The value set is a subclass of..
"Intersection": [                                         // AND
    { "Class": ":VSET_GPConsultations" },                  This value set is a subclass of another value set
    { "ObjectSome":{
                        "Property": ":CM_hasMember",
                        "Intersection": [                                             // AND....
                        {"Class": ":CM_OnPremiseEncounter"},
                        {"Negation": { "Class": ":DM_HospitalOpdEntry"}}]}}]}]}}      // BUT NOT...
</syntaxhighlight>
</syntaxhighlight>


</div></div>
== Sets whose members are subsets ==
This pattern is used for sets that contain categories, each category being another set.


Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members


The set is given an identifier and has on predicate:


<div class="toccolours mw-collapsible mw-collapsed">
# has Subset. Which is a collection of subsets which may be either member based sets or subset based set
OWL2 functional syntax
<div class="mw-collapsible-content">
<pre>
SubClassOf(:VSET_EncFaceToFaceOnPrem
  ObjectIntersectionOf(:VSET_GPConsultations
          ObjectSomeValuesFrom(:CM_hasMember
              ObjectIntersectionOf(:CM_OnPremiseEncounter :ObjectComplementOf(:DM_HospOpdEntry))))


</pre> </div></div>
The following example lists two of the categories of the ethnic category set.<syntaxhighlight lang="turtle">
sn:92381000000106        |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];
</syntaxhighlight>


== Set definitions versus set expansions ==
Sets are usually defined with only a few members although in some cases can get quite large. Each member is a class definition. In general it is better to define sets with "high level" concepts so that when new child concepts are added in later the actual set used is updated.


When used in query it is usually necessary to use the "expanded" set which is in effect all of the instances of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate. (N.B a class is always a subclass of itself)


<div class="toccolours mw-collapsible mw-collapsed">
Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.<syntaxhighlight lang="sparql">
Expression constraint language
select ?concept
<div class="mw-collapsible-content">
where {
<pre>
    ?concept im:isA* ?member.
^ :VSET_EncFactToFaceOnPrem            // this is the query instruction to find concepts defined by the value set
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory 
}


<< :CM_OnPremiseEncounter MINUS :DH_HospOpdEntry    //This is the definition of the value set
---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|
</syntaxhighlight><br />


</pre> </div></div>
== Sets and codes ==
The health service still relies on codes to identify concepts.


<br />
As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs


== Value set definition vs ontology axiom ==
In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.


The main difference between an ontological axiom and a value set definition is their purpose and the way they are used.
The query says "get the concept and NHS Data Dictionary category code  of the members of the ethnicity 2001 set".


Members of a value set set&nbsp;may or may not shared similar properties and therefore may be semantically diverse.&nbsp; Nevertheless each member of a value set&nbsp;is a&nbsp;class expression with semantic meaning. Each member is used to define a set of concepts so that a [[subsumption test]] can be applied to the care record concept instances.
Logically this SPARQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.<syntaxhighlight lang="sparql">
select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
    filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}
</syntaxhighlight>


To put this another way, a value set has members, and each member of the value set is a set of members subsumed by that member!


From the above, it can be seen that value set definition expressions may use complex class expressions. In some cases this expression may already be part of a class definition. For example, a pre-coordinated named class "Home visit" would be a subclass of a Consultation with the care setting of home. Whether expressions in value sets are complex and simple depend entirely on convenience. However, value set editorial policy would encourage value set authors to use pre-defined classes as the ontology is the best place for these to be maintained.
An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion, and extended to include the none core codes could be <syntaxhighlight lang="sql">
select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset'
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'


The other main difference is in the logic applied at the point of query, or more accurately the difference between reasoning and query. When using the value set at run time to undertake subsumption testing of other concepts,&nbsp;&nbsp;a value set query uses a [[wikipedia:Closed-world_assumption|closed world assumption]] &nbsp;when handling negation, both in respect of the ontology sub-classes and the care record instances.  In a pure OWL2 ontology reasoner,  which uses the [[wikipedia:Open-world_assumption|open world assumption]], an expression that uses exclusion will often return no subsumed concept. This is because in the open world, it may be undecidable whether a particular concepts should be excluded or not, unless the ontology author had specifically made sure that the concept was disjoint or negated by some other logic.


Value set expressions are often referred to as expression constraints. In [https://confluence.ihtsdotools.org/display/DOCECL/Expression+Constraint+Language+-+Specification+and+Guide Snomed-CT expression constraint language] is essentially the same grammar as the Discovery value set class expression as shown above with ECL being a more succinct representation, but Discovery having broader logic.&nbsp;
</syntaxhighlight>
 
== Value set transitive closure ==
 
For those operating large scale queries using relational databases, it could be normal practice to use a value set [https://en.wikipedia.org/wiki/Transitive_closure transitive closure] table, or list.
 
The [[Value_set_generator_API|value set generator AP]]I supports the means by which a value set definition can return a set of all known concepts in the ontology that are&nbsp; subsumed by the definition, applying the transitive properties to generate the closure list.&nbsp;


&nbsp;


&nbsp;


&nbsp;
&nbsp;


&nbsp;
&nbsp;

Latest revision as of 10:37, 7 July 2021

All sets are classes. Not all classes are sets.

Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.

Classes are collections of things with common properties. Members of a class all share the properties of the class definition. A member of a class which is specialised by dint of having additional properties to the class is considered a subclass.

Sets are the single most significant structure used in health query. Many terms are in common use to describe different purposes of sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.

Snomed-CT uses the term 'reference set' which is semantically the same as a concept set. Some use the term "code sets" when interested in the codes of the concepts in a set.

It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity can have two types.

(This creates an interesting twist as a search for the subclasses of Ethnic category - 2001 Census would produce different results from a search for all members of subsets of Ethnic category - 2001 Census) In other words, how the concept is treated is determined by the use case and the query.

Set definitions can be simple or complex and fall into 2 main patterns:

Sets whose members are expressions

This pattern forms the majority of set definitions.

The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:

  1. has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
  2. not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection

For example, the following set definition for a positive family history has one member and one excluded member:

im:VSET_RecordType_FamilyHistory             // This is a value set for family history
	rdf:type [ im:ConceptSet ];
	rdfs:label "Family history";
	rdfs:comment "Family history value set not including negative family history";
	im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
	im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|

Sets whose members are subsets

This pattern is used for sets that contain categories, each category being another set.

Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members

The set is given an identifier and has on predicate:

  1. has Subset. Which is a collection of subsets which may be either member based sets or subset based set

The following example lists two of the categories of the ethnic category set.

sn:92381000000106         |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];

Set definitions versus set expansions

Sets are usually defined with only a few members although in some cases can get quite large. Each member is a class definition. In general it is better to define sets with "high level" concepts so that when new child concepts are added in later the actual set used is updated.

When used in query it is usually necessary to use the "expanded" set which is in effect all of the instances of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate. (N.B a class is always a subclass of itself)

Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.

select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory   
}

---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|


Sets and codes

The health service still relies on codes to identify concepts.

As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs

In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.

The query says "get the concept and NHS Data Dictionary category code of the members of the ethnicity 2001 set".

Logically this SPARQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.

select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
     filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}


An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion, and extended to include the none core codes could be

select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset' 
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'