Inbound Pipeline: Difference between revisions

From Discovery Data Service
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
== Introduction ==
== Introduction ==
The DDS inbound transformation pipeline takes raw data from publishers and transforms them to a standard format.
The DDS inbound transformation pipeline takes raw data from publishers and transforms them to a standard format.
[[File:Inbound Pipeline - Raw Data.jpg|frameless|700x700px]]
[[File:Inbound_Pipeline_-_Raw_Data.jpg|alt=|frameless|700x700px]]
 
'''Input''': CSV, TSV, fixed-width, HL7v2 etc. data from multiple publishers
'''Input''': CSV, TSV, fixed-width, HL7v2 etc. data from multiple publishers



Revision as of 09:57, 22 April 2021

Introduction

The DDS inbound transformation pipeline takes raw data from publishers and transforms them to a standard format.

Input: CSV, TSV, fixed-width, HL7v2 etc. data from multiple publishers

Output: FHIR resources, stored as JSON in “ehr” databases, for Outbound Transformation Processing Pipeline to use

Technical Overview

Multiple applications make up the inbound transformation pipeline:

  • White boxes are DDS applications
  • Green boxes show third party data sources
  • Arrows show communication method between components
  • Not all databases shown
  • Interaction with databases simplified

Databases

  • Multiple databases are used in the inbound transformation pipeline
  • Architecture is designed to avoid cross-database joins, so databases do not need to be co-located on the same instance
  • In production DDS instance, there are approx. 10 database instances
  • All database access layer code in EdsCore repository (except for SFTP Reader and HL7 Receiver applications)
Database Name Platform Description
Admin MySQL Stores details on services publishing to and subscribing from the DDS
Audit MySQL Stores audit of published data and transformations to it
Config MySQL Stores most configuration including database connection strings, logging, RabbitMQ routing
Eds MySQL Stores patient demographics (duplicated from the FHIR in the ehr database) and details of patient-person matching
Ehr MySQL Stores all patient and reference data in FHIR JSON. There is support for multiple ehr databases, with each publishing service configured to write to a specific ehr (many to one)
Fhir_audit MySQL Stores record-level audit of mapping from published data to FHIR
Hl7_receiver PostgreSQL Stores all state and configuration for HL7 Receiver application. This application is PostgreSQL-only.
Publisher_common MySQL Stores resources for transforms that are not publisher specific. For example, stores Emis code reference data, which is common to all Emis publishers.
Publisher_transform MySQL Stores persistent mappings from published data IDs to DDS UUIDs, at the service level. Each ehr database has a corresponding publisher_transform database.
Reference MySQL Provides standard reference data from TRUD and ONS e.g. SnomedCT, Read2, OPCS-4, ICD-10, postcodes
Sftp_reader PostgreSQL & MySQL Stores all state and configuration for SFTP Reader application. Note that this application can run on PostgreSQL (in DDS live) and MySQL (everywhere else)