Remote subscriber data definition

From Discovery Data Service
Revision as of 09:23, 10 June 2021 by JoC (talk | contribs)
Jump to navigation Jump to search

This article describes how the Discovery Data Service makes data available for Remote Subscriber Databases (RSDs) and how the DDS Remote Filer application interacts with this to update an RSD. Although it is recommended that the Remote Filer application be used for updating RSDs, any other technical solution can be substituted provided it is able to match the Remote Filer behaviour as far as the points of interaction are concerned.

Subscriber database schema

DDS currently provides SQL scripts for creating subscriber databases, one for each of the two support database engines; the database scripts are the same no matter what configuration options are selected when setting up the feed from DDS (e.g. PI versus de-identified).

Please note the following:
  • The above are links to the public GitHub repository which was in active development until early 2021. Development has moved to private GitHub repositories since this date, so the SQL schemas linked to above should only be taken as illustrations of the schema and not the latest version (which can be provided on request).
  • The Remote Filer application currently only supports loading data into these two database engines, and part of the data feed (for reference data) is sent as raw SQL and only these two formats are supported.
  • DDS still supports data feeds to an older version of the subscriber database known as Compass v1 (the current version being v2). Although v1 is still supported, new instances of this will not be deployed and this article specifically addresses the v2 standard. Future developments/improvements to the DDS subscriber database will be iterative upgrades to v2.

Subscriber feeds

There are two separate feeds of data that DDS sends to each RSD:

  • Published Data Feed – this includes all patient data, plus some supporting data (clinicians and organisations for example) that is sent into DDS by external publishers.
  • Reference Data Feed – this includes lookups and mappings for clinical codes (Read2 to SNOMED for example) that is not directly published into DDS but is updated in subscriber databases.

Data for each feed is staged in a separate directory on the DDS SFTP server for each subscriber. The DDS Remote Filer application runs supports running in two different modes, one to download and process the Published Data Feed and the other to download and process the Reference Data Feed.

If you replace the Remote Filer application with an alternative solution it must support both feeds.

DDS SFTP server

The DDS SFTP server is used to stage all data for all RSDs.

A user is created on this server for each DDS subscriber, with their own username and SSH certificate, to allow them to securely access the data intended for them.

If a DDS subscriber has multiple RSDs (for example, one for GP data and one for acute data), the same SFTP user is used for both RSDs.

For each RSD, the following three directories are created under the SFTP user home directory to:

  1. stage data for download for the Published Data Feed.
  2. upload feedback files related to the Published Data Feed (feedback files are explained later in this article).
  3. stage data for download for the Reference Data Feed.

To illustrate:

DDS SFTP server.png






Published Data Feed Staging Directory

This directory is used by DDS to stage the published data intended for the RSD and includes all the patient and clinical data. When DDS has data to make available to a subscriber it is placed in this directory.

The files placed in this directory are always named in the format:

<YYYYMMDDHHMMSS>_Subscriber_Data.zip

Where YYYYMMDDHHMMSS is the date and time the data is staged for collection. When a Remote Filer connects and downloads the files, they should be sorted by file name, so they are in date order, and applied in that order.