Inbound Data Transform Specifications

From Discovery Data Service
Revision as of 11:47, 20 April 2021 by JoC (talk | contribs) (Created page with "Introduction This article forms the basis of the technical implementation of a new inbound transform of data into the Discovery Data Service (DDS), specifically for non-transa...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction This article forms the basis of the technical implementation of a new inbound transform of data into the Discovery Data Service (DDS), specifically for non-transactional flat file formats. Throughout the document is yellow-highlighted text prefixed with “TODO”, which is where information is required. It is fully expected that this document cannot be completed in a single pass and will need to go back and forth between the DDS team and the publisher several times before both parties agree it is complete. The aim is to answer the following questions: How will the source data be sent to DDS? E.g. Will it be an SFTP push or pull? Will the files be encrypted? How often will the source data be sent to DDS? E.g. will it be sent daily extract? How should the source data be mapped to the FHIR-based DDS data format? E.g. What file contains patient demographics? What column contains first name? What column contains middle names? How should any value sets should be mapped to equivalent DDS value sets? E.g. What values are used for different genders and what FHIR gender does each map to? What clinical coding systems are used? E.g. Are clinical observations recorded using SNOMED CT, CTV3, Read2, ICD-10, OPCS-4 or some other nationally or locally defined system. How are source data records uniquely identified within the files? E.g. What is the primary identifier/key in each file and how do files reference each other? How are inserts, updates and deletes represented in the source data? E.g. Is there a “deleted” column? Is there any special knowledge required to accurately process the data? Will the publishing of data be done in a phased approach? E.g. Will the demographics feed be turned on first, with more complex clinical structures at a later date? What files require bulk dumps for the first extract and which will start from a point in time? E.g. Will a full dump of organisational data be possible? Will a full dump of master patient list be possible? Is there any overlap with an existing transform supported by DDS? E.g. does the feed include any national standard file that DDS already supports? In completing this document, the above questions should be answered. The understanding of the answers will then be used by the DDS technical team to implement the technical transformation for the source data. Completion and sign off of this document is a collaborative process, involving the data publisher (or their representatives) and the DDS technical team. The data publisher will have far greater understanding of their own data, and the DDS technical team have experience in multiple other transformations.