Sets and classes

From Discovery Data Service
Jump to navigation Jump to search

All sets are classes. Not all classes are sets.

Sets are collections of things. Members of a set (items in a set) may or may not share properties with each other.

Classes are collections of things with common properties. Members of a class all share the properties of the class definition. A member of a class which is specialised by dint of having additional properties to the class is considered a subclass.

Sets are the single most significant structure used in health query. Many terms are in common use to describe different purposes of sets. In the information model the terms "concept sets" and "value sets" are the ones used, the former being defined for use in query and the latter to define expected values of properties in records.

Snomed-CT uses the term 'reference set' which is semantically the same as a concept set. Some use the term "code sets" when interested in the codes of the concepts in a set.

It should not noted that an entity may be explicitly defined as being both a class (e.g. a Snomed-CT concept), and a Set (e.g. Ethnic category - 2001 Census). In other words an entity can have two types.

(This creates an interesting twist as a search for the subclasses of Ethnic category - 2001 Census would produce different results from a search for all members of subsets of Ethnic category - 2001 Census) In other words, how the concept is treated is determined by the use case and the query.

Set definitions can be simple or complex and fall into 2 main patterns:

Sets whose members are expressions

This pattern forms the majority of set definitions.

The set is given an identifier (an IRI) and is defined as a concept in its own right. This set pattern then has two predicates:

  1. has Members. Which is a collection of class expressions (e.g. simple concepts or complex expressions)
  2. not Members. Which is a collection of class expressions that are excluded from the set i.e. those that would otherwise have been included as subclasses of the 'has member' collection

For example, the following set definition for a positive family history has one member and one excluded member:

im:VSET_RecordType_FamilyHistory             // This is a value set for family history
	rdf:type [ im:ConceptSet ];
	rdfs:label "Family history";
	rdfs:comment "Family history value set not including negative family history";
	im:notMembers [ sn:160266009;    |No family history of clinical finding (situation)|;
	im:hasMembers [ sn:57177007. |Family history with explicit context (situation)|

Sets whose members are subsets

This pattern is used for sets that contain categories, each category being another set.

Categorising set members is a common approach for presenting aggregate data. For example, the set "Ethnic category - 2001 Census" would be modelled as a set with approximately 17 categories (subsets) each category containing many members

The set is given an identifier and has on predicate:

  1. has Subset. Which is a collection of subsets which may be either member based sets or subset based set

The following example lists two of the categories of the ethnic category set.

sn:92381000000106         |Ethnic category - 2001 census (finding)|
  rdf:type [owl:Class im:ConceptSet];
  rdfs:label "Ethnic category - 2001 census (finding)|"
  im:hasSubsets[sn:92491000000104 |African - ethnic category 2001 census (finding)|
                92471000000103 |Bangladeshi or British Bangladeshi - ethnic category 2001 census (finding)|
                ... etc ];

Set definitions versus set expansions

Sets are usually defined with only a few members although in some cases can get quite large. Each member is a class definition. In general it is better to define sets with "high level" concepts so that when new child concepts are added in later the actual set used is updated.

When used in query it is usually necessary to use the "expanded" set which is in effect all of the instances of the subclasses of each member, obtained via a transitive closure of the "isa" or "subClassOf" predicate. (N.B a class is always a subclass of itself)

Thus the following query would be used to expand a member based concept set as above. Note that the im:isA predicate is the same as the Snomed-CT "is a" relationship and the same as rdfs:subClassOf.

select ?concept
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember im:VSET_RecordType_FamilyHistory   
}

---Result ---
?concept
sn:64571000119104 |Family history of abdominal wall defect (situation)
sn:430560006 |Family history of chronic renal impairment (situation)|
sn:959511000000100 |Family history of end stage renal disease (situation)|


Sets and codes

The health service still relies on codes to identify concepts.

As in the ontology as a whole, a concept may be associated with many codes from many schemes. These codes can then be used in categorising outputs

In this situation a concept set is defined as above but when used, the query may which to output the category. Example query for ethnic category value set expansion.

The query says "get the concept and NHS Data Dictionary category code of the members of the ethnicity 2001 set".

Logically this SPARQL graph query traverses from the set identifier (2001 census ethnic categories) to its subsets (e.g. African) to the code of the subset (e.g. N) the scheme of the code being NHS Data dictionary Ethnic Categories 2001.

select ?concept ?category
where {
    ?concept im:isA* ?member.
    ?member ^im:hasMember ?subset.
    ?subset ^im:hasSubset sn:92381000000106.    #The 2001 census ethnic categories
    ?termCode ^im:hasTermCode ?subset
     filter exists {?termCode ?hasScheme im:NHSDataDictionaryEthnicCategory2001 }.
    ?category ^im:code ?termCode
}


An example of generated code from a mapped SQL operating on a relational model, which logically performs the same expansion, and extended to include the none core codes could be

select locals.code as localCode,tc.code as categoryCode
from entity e
join tpl on tpl.subject=e.dbid
join entity subset on tpl.object= subset.dbid
join entity hasSubsets on tpl.predicate=hasSubsets.dbid
join term_code tc on tc.entity=subset.dbid
join entity scheme on tc.scheme= scheme.dbid
join tpl tpl2 on tpl2.subject= subset.dbid
join entity aMember on tpl2.object= aMember.dbid
join entity hasMember on tpl2.predicate=hasMember.dbid
join tct on tct.ancestor=aMember.dbid
join entity allMembers on tct.descendant=AllMembers.dbid
left join term_code locals on locals.entity=allMembers.dbid
join entity localScheme on locals.scheme=localScheme.dbid
where e.iri='http://snomed.info/sct#92381000000106' and hasSubsets.iri='http://endhealth.info/im#hasSubset' 
and scheme.iri='http://endhealth.info/im#NHSDataDictionaryEthnicCategory2001'
and hasMember.iri='http://endhealth.info/im#hasMembers'