Remote Subscriber Database (RSD) SQL guide (Compass 1)

From Discovery Data Service
Revision as of 15:55, 18 May 2020 by JoC (talk | contribs)
Jump to navigation Jump to search

Introduction

This article describes the steps needed to create, and analyse, patient data selections from the Discovery database.

Datasets are needed to add patient data to a report. Each dataset represents the results gained from running a combination of SQL commands and stored procedures on a data source.

The following diagram shows the algorithm that is followed to generate these datasets:

The algorithm followed to generate datasets

Patient cohort

Creating the patient cohort

The first step, when a list of patients marked for observational study is received, is to create a patient cohort. Patient data usually comes into the DDS in the form of pseudonymised personal data. Additional information about the patient can be obtained from the Discovery database using the pseudo identifier or the pseudo NHS identifier.

The following entity diagram shows the relationships between patient and organisation:

General patient cohort

Patients are linked to the Discovery database using pseudo patient identifiers. However, if only pseudo NHS numbers are provided, the corresponding pseudo identifiers can be derived from the link distributor table. Patients are filtered by organisation ODS codes to ensure that only the patients of the required GP practices are selected.

Building the patient cohort SQL with pseudo NHS numbers

The following example SQL shows how a patient cohort is built using pseudo NHS numbers:

Patient cohort SQL with pseudo NHS numbers

Sometimes, the cohort is created with filtered observations.

The example below filters patients who are registered, alive, and have had diabetes. They belong to the age 12 years and over. Two additional tables are introduced for the query: observation(patients who have had diabetes) and episode of care (patients who are registered and have the appropriate registration type for this study).

Cohorts with filtered observations

Patients can be selected with a specific set of observations using a lookup table of relevant SNOMED codes.
Some patients prefer to withhold consent and are therefore are excluded from the cohort; the relevant SNOMED consent code can be checked in the observation table.
Patients are usually only select if they have registration type 2 (Regular/GMS).
Sometimes patients are also based on their age in years.

Building patient cohort SQL with filtered observations

Building patient cohort SQL with filtered observations

Observation cohort

Creating the observation cohort

Observation cohort are created to improve query performance; instead of scanning through the entire observation table, a sub-set of the observation table is created, based on the patient cohort.

Building the observation cohort SQL 

Building observation cohort SQL

Medications cohort

Creating the medications cohort

Medications cohort are created to improve query performance; instead of scanning through the entire observation table, a sub-set of the medication statement table is created, based on the patient cohort.

Patient, medication statement and organisation

Building the medication cohort SQL

Building medication cohort SQL

Other cohorts

Creating other cohorts

This step can apply to any large tables to support improved query performance. For example, the encounter_raw table is a good candidate to create a separate cohort for SQL query; you can query this table if you want information such as the number of face to face encounters, or the number of home visits.

Encounter raw entity

Building the encounter raw cohort SQL

Building encounter raw cohort SQL

Building datasets

This involves creating a number of tables in the database.

Patient demographic dataset

A typical patient demographic dataset includes the following information: Patient demographic dataset This information is translated into a 'create table' script, which is then run to create the dataset table in the database.

Create table script for a patient demographic dataset table

Create table script for patient demographic dataset

The dataset is then populated with pseudo identifiers from the patient cohort.

Demographic dataset.png

Separate update SQL queries then populate the columns of the dataset table. For example:

Column update dataset table.png

In the following example, the age and LSOA code are derived from the Patient table. The demographic dataset table is updated with those values when there is a match on the pseudo identifier.

If the update is large, it might be possible to optimise the update process by using batches; for example to add IMD2010:

IMD2010.png

Add registration start date, registration end date, practice ODS code, practice ODS name:

In the above example, two temporary tables are created; the first to hold the registration data, and the second to select patient records that have the latest registration date.

Add the total number of face to face visits:

Face to face visits.png

DROP TEMPORARY TABLE IF EXISTS noVisitsTotal;
CREATE TEMPORARY TABLE noVisitsTotal (
	row_id INT,
	group_by VARCHAR(255),
	visits INT, PRIMARY KEY(row_id) ) AS
SELECT (@row_no := @row_no + 1) AS row_id,
	b.group_by,
	b.visits
FROM (SELECT cr.group_by AS group_by,
	COUNT(DISTINCT e.clinical_effective_date) AS visits
	FROM cohort_gh2_encounter_raw e JOIN cohort_gh2 cr
	ON e.person_id = cr.person_id
	WHERE EXISTS (SELECT 'x' FROM gh2_f2fEncounters s
	WHERE s.term = e.fhir_original_term)
	AND e.clinical_effective_date IS NOT NULL
	AND e.fhir_original_term IS NOT NULL
	GROUP BY cr.group_by) b, (SELECT @row_no := 0) t;
SET @row_id = 0;
WHILE EXISTS (SELECT row_id from noVisitsTotal
		WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO
	UPDATE gh2_demographicsDataset d
	JOIN noVisitsTotal nvt ON d.pseudo_id = nvt.group_by
	SET d.F2fVisits_Total = nvt.visits
	WHERE nvt.row_id > @row_id AND nvt.row_id <= @row_id + 1000;
	SET @row_id = @row_id + 1000;
END WHILE;