Inbound Data Transform Specifications: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
(Created page with "Introduction This article forms the basis of the technical implementation of a new inbound transform of data into the Discovery Data Service (DDS), specifically for non-transa...")
 
No edit summary
Line 1: Line 1:
Introduction
== Introduction ==
This article forms the basis of the technical implementation of a new inbound transform of data into the Discovery Data Service (DDS), specifically for non-transactional flat file formats.  
This article forms the basis of the technical implementation of any new inbound transform of data into the Discovery Data Service (DDS), specifically for non-transactional flat file formats.  
Throughout the document is yellow-highlighted text prefixed with “TODO”, which is where information is required. It is fully expected that this document cannot be completed in a single pass and will need to go back and forth between the DDS team and the publisher several times before both parties agree it is complete.
 
The aim is to answer the following questions:
The aim is to ask potential publishers to consider and answer the following questions:
How will the source data be sent to DDS? E.g. Will it be an SFTP push or pull? Will the files be encrypted?
 
How often will the source data be sent to DDS? E.g. will it be sent daily extract?
* How will the source data be sent to DDS? Will it be an SFTP push or pull? Will the files be encrypted?
How should the source data be mapped to the FHIR-based DDS data format? E.g. What file contains patient demographics? What column contains first name? What column contains middle names?
* How often will the source data be sent to DDS? Will it be sent in a daily extract?
How should any value sets should be mapped to equivalent DDS value sets? E.g. What values are used for different genders and what FHIR gender does each map to?
* How should the source data be mapped to the FHIR-based DDS data format? What file contains patient demographics? What column contains first name? What column contains middle names?
What clinical coding systems are used? E.g. Are clinical observations recorded using SNOMED CT, CTV3, Read2, ICD-10, OPCS-4 or some other nationally or locally defined system.
* How should any value sets should be mapped to equivalent DDS value sets? What values are used for different genders and what FHIR gender does each map to?
How are source data records uniquely identified within the files? E.g. What is the primary identifier/key in each file and how do files reference each other?
* What clinical coding systems are used? Are clinical observations recorded using SNOMED CT, CTV3, Read2, ICD-10, OPCS-4 or some other nationally or locally defined system?
How are inserts, updates and deletes represented in the source data? E.g. Is there a “deleted” column?  
* How are source data records uniquely identified within the files? What is the primary identifier/key in each file and how do files reference each other?
Is there any special knowledge required to accurately process the data?  
* How are inserts, updates and deletes represented in the source data? Is there a 'deleted' column?  
Will the publishing of data be done in a phased approach? E.g. Will the demographics feed be turned on first, with more complex clinical structures at a later date?
* Is there any special knowledge required to accurately process the data?  
What files require bulk dumps for the first extract and which will start from a point in time? E.g. Will a full dump of organisational data be possible? Will a full dump of master patient list be possible?
* Will the publishing of data be done in a phased approach? Will the demographics feed be turned on first, with more complex clinical structures at a later date?
Is there any overlap with an existing transform supported by DDS? E.g. does the feed include any national standard file that DDS already supports?
* What files require bulk dumps for the first extract and which will start from a point in time? Will a full dump of organisational data be possible? Will a full dump of master patient list be possible?
In completing this document, the above questions should be answered. The understanding of the answers will then be used by the DDS technical team to implement the technical transformation for the source data.  
* Is there any overlap with an existing transform supported by DDS? Does the feed include any national standard file that DDS already supports?
 
The understanding of the answers provided can then be used by the DDS technical team to implement the technical transformation for the source data.  
 
Completion and sign off of this document is a collaborative process, involving the data publisher (or their representatives) and the DDS technical team. The data publisher will have far greater understanding of their own data, and the DDS technical team have experience in multiple other transformations.
Completion and sign off of this document is a collaborative process, involving the data publisher (or their representatives) and the DDS technical team. The data publisher will have far greater understanding of their own data, and the DDS technical team have experience in multiple other transformations.

Revision as of 12:44, 20 April 2021

Introduction

This article forms the basis of the technical implementation of any new inbound transform of data into the Discovery Data Service (DDS), specifically for non-transactional flat file formats.

The aim is to ask potential publishers to consider and answer the following questions:

  • How will the source data be sent to DDS? Will it be an SFTP push or pull? Will the files be encrypted?
  • How often will the source data be sent to DDS? Will it be sent in a daily extract?
  • How should the source data be mapped to the FHIR-based DDS data format? What file contains patient demographics? What column contains first name? What column contains middle names?
  • How should any value sets should be mapped to equivalent DDS value sets? What values are used for different genders and what FHIR gender does each map to?
  • What clinical coding systems are used? Are clinical observations recorded using SNOMED CT, CTV3, Read2, ICD-10, OPCS-4 or some other nationally or locally defined system?
  • How are source data records uniquely identified within the files? What is the primary identifier/key in each file and how do files reference each other?
  • How are inserts, updates and deletes represented in the source data? Is there a 'deleted' column?
  • Is there any special knowledge required to accurately process the data?
  • Will the publishing of data be done in a phased approach? Will the demographics feed be turned on first, with more complex clinical structures at a later date?
  • What files require bulk dumps for the first extract and which will start from a point in time? Will a full dump of organisational data be possible? Will a full dump of master patient list be possible?
  • Is there any overlap with an existing transform supported by DDS? Does the feed include any national standard file that DDS already supports?

The understanding of the answers provided can then be used by the DDS technical team to implement the technical transformation for the source data.

Completion and sign off of this document is a collaborative process, involving the data publisher (or their representatives) and the DDS technical team. The data publisher will have far greater understanding of their own data, and the DDS technical team have experience in multiple other transformations.