Sets and classes: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
All sets are classes. Not all classes are sets.


A value set definition, and it's run time counterpart- value set transitive closure  , is a set of [[Concepts_classes_and_properties|class expressions]] collected together for a particular business purpose.
Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.  


There are a range of purposes for a value set. Examples range from defining a data set according to a set of recorded concepts, indicating the expected range of a property in a health record, or testing the presence of a feature in a patient record. 
Classes are collections of things with common properties. Members of a class all share the properties of the class definition. A member of a class  which is specialised by dint of having additional properties to the class is considered a subclass.


Value sets are one of the main 4 packages within the common information model:[[File:IM Package Value set.jpg|center|400x250px|IM Package Value set.jpg]]
Sets are the single most significant structure used in health query. Many terms are in common use to describe different purposes of sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.


 
Snomed-CT uses the term 'reference set' which is semantically the same as a concept set. Some use the term "code sets" when interested in the codes of the concepts in a set.


 
'''''It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity can have two types.'''''


== Value set class and language ==
(This creates an interesting twist as a search for the subclasses of '''''Ethnic category - 2001 Census''''' would produce different results from a search for all members of subsets of  '''''Ethnic category - 2001 Census)''''' In other words, how the concept is treated is determined by the use case and the query.


A value set class is best conceptualised in UML terms as a package whose elements are class expressions.
Set definitions can be simple or complex and fall into 2 main patterns:


All value sets and the class expressions use concepts from the semantic ontology and the following shows the relationship between the two.[[File:Value set class.png|center|800x600px|Value set class.png]]
== Sets whose members are expressions ==
This pattern forms the majority of set definitions.


A value set class definition looks very much like a semantic ontology class expression,  in that it uses similar grammar and syntax. In fact a value set can also be defined in OWL2 language.
The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:


In this example a non hospital face to face consultation value set is defined as "is a GP consultation value sethas members which are (on premise encounters, excluding hospital outpatients)
# has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
# not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection


<div class="toccolours mw-collapsible mw-collapsed">
For example, the following set definition for a positive family history has one member and one excluded member:<syntaxhighlight lang="turtle">
Discovery syntax
im:VSET_RecordType_FamilyHistory            // This is a value set for family history
rdf:type [ im:ConceptSet ];
rdfs:label "Family history";
rdfs:comment "Family history value set not including negative family history";
im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|
</syntaxhighlight>


<div class="mw-collapsible-content">
== Sets whose members are subsets ==
<syntaxhighlight lang="JSON">
This pattern is used for sets that contain categories, each category being another set.
{ "name": ":face to face consultations excluding hospital outpatients",
 
  "SubClassOf": [{
Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members
"Intersection": [{
    "Class": ":VSET_GPConsultations" },
    { "ObjectSome": {"Property": ":CM_hasMember",
    "Intersection": [{
"Class": ":CM_OnPremiseEncounter"},
  {"Negation": { "Class": ":DM_HospitalOpdEntry"}}]}}]}]}}
</syntaxhighlight>


</div> <div class="mw-collapsible-content">&nbsp;</div>
The set is given an identifier and has on predicate:
</div>


<div class="toccolours mw-collapsible mw-collapsed">
# has Subset. Which is a collection of subsets which may be either member based sets or subset based set
OWL2 functional syntax
<div class="mw-collapsible-content">
<pre>
SubClassOf(:VSET_EncFaceToFaceOnPrem
  ObjectIntersectionOf(:VSET_GPConsultations
          ObjectSomeValuesFrom(:CM_hasMember
              ObjectIntersectionOf(:CM_OnPremiseEncounter :ObjectComplementOf(:DM_HospOpdEntry))))


</pre> </div> <div class="mw-collapsible-content">&nbsp;</div>
The following example lists two of the categories of the ethnic category set.<syntaxhighlight lang="turtle">
</div>
sn:92381000000106        |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];
</syntaxhighlight>


== Set definitions versus set expansions ==
Sets are usually defined with only a few members although in some cases can get quite large. Each member is a class definition. In general it is better to define sets with "high level" concepts so that when new child concepts are added in later the actual set used is updated.


When used in query it is usually necessary to use the "expanded" set which is in effect all of the instances of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate. (N.B a class is always a subclass of itself)


The difference is mostly in how they are used. Members of the value set set&nbsp;may or may not shared similar properties and therefore may be semantically diverse.&nbsp; Nevertheless the class expression members of the value set&nbsp;''are''&nbsp;class expressions with semantic meaning, because each member is used to define a set of concepts by applying a [[subsumption test]] to each of the ontology concepts and the target record concepts.
Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.<syntaxhighlight lang="sparql">
select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory 
}


To put this another way, a value set has members, and each member of the value set is a set of members subsumed by that member!
---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|
</syntaxhighlight><br />


From the above, it can be seen that value set definition expressions may use complex class expressions. In some cases this expression may already be part of a class definition. For example in the above case, a pre-coordinated named class "Home visit" would be a subclass of a Consultation with the care setting of home. Whether expressions in value sets are complex and simple depend entirely on convenience. However, value set editorial policy would encourage value set authors to use pre-defined classes as the ontology is the best place for these to be maintained.
== Sets and codes ==
The health service still relies on codes to identify concepts.


== Value set member expression vs ontology expression ==
As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs


The two are seemingly identical and often the same in terms of syntax. However, a value set expression is used in a different way when used for subsumption testing of the concepts in the ontology.
In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.


When using the value set at run time to undertake subsumption testing of other concepts,&nbsp;&nbsp;the ontology query uses a [https://en.wikipedia.org/wiki/Closed-world_assumption closed world assumption] &nbsp;when handling negation. In a pure OWL2 ontology, which uses the [https://en.wikipedia.org/wiki/Open-world_assumption open world assumption], an expression that uses exclusion will often return no subsumed concept. This is because in the open world, it may be undecidable whether a particular concepts should be excluded or not, unless the ontology author had specifically made sure that the concept was disjoint or negated by some other logic.
The query says "get the concept and NHS Data Dictionary category code  of the members of the ethnicity 2001 set".


Value set expressions are often referred to as expression constraints. In [https://confluence.ihtsdotools.org/display/DOCECL/Expression+Constraint+Language+-+Specification+and+Guide Snomed-CT expression constraint language] is essentially the same grammar as the Discovery value set class expression.&nbsp;
Logically this SPARQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.<syntaxhighlight lang="sparql">
select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
    filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}
</syntaxhighlight>


&nbsp;


== Value set transitive closure ==
An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion, and extended to include the none core codes could be <syntaxhighlight lang="sql">
select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset'
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'


For those operating large scale queries using relational databases, it could be normal practice to use a value set [https://en.wikipedia.org/wiki/Transitive_closure transitive closure] table, or list.


The [[Value_set_generator_API|value set generator AP]]I supports the means by which a value set definition can return a set of all known concepts in the ontology that are&nbsp; subsumed by the defintion.&nbsp;
</syntaxhighlight>


&nbsp;


&nbsp;


&nbsp;
&nbsp;


&nbsp;
&nbsp;

Latest revision as of 10:37, 7 July 2021

All sets are classes. Not all classes are sets.

Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.

Classes are collections of things with common properties. Members of a class all share the properties of the class definition. A member of a class which is specialised by dint of having additional properties to the class is considered a subclass.

Sets are the single most significant structure used in health query. Many terms are in common use to describe different purposes of sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.

Snomed-CT uses the term 'reference set' which is semantically the same as a concept set. Some use the term "code sets" when interested in the codes of the concepts in a set.

It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity can have two types.

(This creates an interesting twist as a search for the subclasses of Ethnic category - 2001 Census would produce different results from a search for all members of subsets of Ethnic category - 2001 Census) In other words, how the concept is treated is determined by the use case and the query.

Set definitions can be simple or complex and fall into 2 main patterns:

Sets whose members are expressions

This pattern forms the majority of set definitions.

The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:

  1. has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
  2. not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection

For example, the following set definition for a positive family history has one member and one excluded member:

im:VSET_RecordType_FamilyHistory             // This is a value set for family history
	rdf:type [ im:ConceptSet ];
	rdfs:label "Family history";
	rdfs:comment "Family history value set not including negative family history";
	im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
	im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|

Sets whose members are subsets

This pattern is used for sets that contain categories, each category being another set.

Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members

The set is given an identifier and has on predicate:

  1. has Subset. Which is a collection of subsets which may be either member based sets or subset based set

The following example lists two of the categories of the ethnic category set.

sn:92381000000106         |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];

Set definitions versus set expansions

Sets are usually defined with only a few members although in some cases can get quite large. Each member is a class definition. In general it is better to define sets with "high level" concepts so that when new child concepts are added in later the actual set used is updated.

When used in query it is usually necessary to use the "expanded" set which is in effect all of the instances of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate. (N.B a class is always a subclass of itself)

Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.

select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory   
}

---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|


Sets and codes

The health service still relies on codes to identify concepts.

As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs

In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.

The query says "get the concept and NHS Data Dictionary category code of the members of the ethnicity 2001 set".

Logically this SPARQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.

select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
     filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}


An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion, and extended to include the none core codes could be

select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset' 
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'