Sets and classes

From Discovery Data Service
Revision as of 10:28, 7 July 2021 by DavidStables (talk | contribs)
Jump to navigation Jump to search

All sets are classes. Not all classes are sets.

Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.

Classes are collections of things with common properties. Members of a class all share some of their properties. A member of a class which is specialised by dint of having additional properties to the class is considered a subclass.

Sets are the single most significant structure used in health query. A whole set of terms are in common use to describe different sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.

Snomed-CT uses the term 'reference set' which is semantically the same as a concept set.

It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity with two types.

(This creates an interesting twist as a search for the subclasses of Ethnic category - 2001 Census would produce different results from a search for all members of subsets of Ethnic category - 2001 Census) In other words, how the concept is treated is determined by the use case.

Set definitions can be simple or complex and fall into 2 main patterns:

Sets whose members are expressions

This pattern forms the majority of sets.

The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:

  1. has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
  2. not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection.
im:VSET_RecordType_FamilyHistory             // This is a value set for family history
	rdf:type [ im:ConceptSet ];
	rdfs:label "Family history";
	rdfs:comment "Family history value set not including negative family history";
	im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
	im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|

Sets whose members are subsets

This pattern is used for sets that contain categories, each category being another set.

Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members

The set is given an identifier and has on predicate:

  1. has Subset. Which is a collection of subsets which may be either member based sets or subset based sets
sn:92381000000106         |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];

Set definitions versus set expansions

Sets are usually defined with only a few members. Each member is a class definition.

When used in query it is usually necessary to use the "expanded" set which is in effect all of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate.

Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.

select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory   
}

---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|


Sets and codes

The health service still relies on codes to identify concepts (classes) and concepts as sets.

As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs

In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.

The query says "get the concept and NHS Data Dictionary category code of the members of the ethnicity 2001 set".

Logically this SPQRQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.

select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
     filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}


An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion could be

select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset' 
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'