Skip to content

DIMP (De-Identification-Minimisation-Pseudonymisation)

DIMP is the act of

  • De-identifying: Aggregating or transforming data to prevent re-identification (e.g. cutting of the birthdate at the month, shortening the ZIP from 5 to 2 characters)
  • Minimizing: Removing any data from a data set which is not necessary for a specific data use project (e.g. for a study which requires diagnosis codes the free text annotation of the diagnosis is not necessary)
  • Pseudonymising: Replacing identifier or IDs with Pseudonyms or hashed IDs to avoid direct re-identification (e.g. Patiend-ID-123 -> Patient_PSEUDONYM-999)

data for a data use project to preserve patient privacy.

FHIR Pseudonymizer and DIMP DUP Base yaml

To support standardized data use projects (DUPs), a DIMP DUP base configuration has been created, which can be used in conjunction with the fhir-pseudonymizer to apply DIMP functions to data. It implements the DIMP pseudonymization functions required by most data use projects for the fields defined in the MII core dataset.

This configuration is provided as a guideline only and does not guarantee compliance with applicable data privacy regulations.

Depending on your specific setup or the characteristics of your data, this base configuration will likely need to be extended or adjusted to meet the requirements of your particular project and/or site.

Table with list of applied DIMP rules:
DSC ConceptFHIR ResourceFHIR ElementPrivacy RequirementDescriptionDIMP ImplementationDUP Base YAML
Technical IDAll.idCrypto hashTechnical resource ID, generated and assigned by the FHIR server. Not meaningful outside the system.Replace with CryptoHash- path: Resource.id
method: cryptoHash
truncateToMaxLength: 32
Technical ReferencesAll.referenceCrypto hashTechnical reference IDs linking resources to one another.Replace with CryptoHash- path: nodesByType('Reference').reference
method: cryptoHash
truncateToMaxLength: 32
Reference IdentifierAllReference.identifierRedact unless otherwise specified — see Encounter and Patient identifier rulesLogical identifier embedded in a reference. Redacted by default; specific identifier types are handled by more targeted rules below.Redact- path: nodesByType('Reference').identifier
method: redact
Encounter IdentifierAllEncounter.identifierIDAT – do not exportLogical encounter identifier, potentially a direct reference to the hospital's internal encounter ID (e.g. VN).Replace via re-pseudonymization using pseudonymization software- path: nodesByType('Identifier').where(type.coding.where(system='http://terminology.hl7.org/CodeSystem/v2-0203' and code='VN').exists()).value
method: pseudonymize
domain: https://my-dic-domain/identifiers/encounter-id
Patient IdentifierAllPatient.identifierIDAT – do not exportLogical patient identifier, potentially a direct reference to the hospital's internal patient ID (e.g. MR).Replace via re-pseudonymization using pseudonymization software- path: nodesByType('Identifier').where(type.coding.where(system='http://terminology.hl7.org/CodeSystem/v2-0203' and code='MR').exists()).value
method: pseudonymize
domain: https://my-dic-domain/identifiers/patient-id
NamePatientPatient.nameIDAT – do not exportPatient name; multiple HumanName elements may be present (e.g. official, maiden, nickname).Redact all HumanName nodes- path: nodesByType('HumanName')
method: redact
SexPatientPatient.genderIDAT and MDAT – export permittedAdministrative gender per the FHIR required value set (male, female, other, unknown).
Date of BirthPatientPatient.birthDateIDAT and MDAT – generalize to at least month precisionFull date of birth of the patient. Must be generalized before export.Generalize to year-month (YYYY-MM)- path: Patient.birthDate
method: generalize
cases:
"$this": "$this.toString().replaceMatches('(?<year>\\d{2,4})-(?<month>\\d{2})-(?<day>\\d{2})\\b', '${year}-${month}')"
Deceased (flag)PatientPatient.deceased.ofType(boolean)IDAT – removal recommended per DSC; subject to further discussionBoolean flag indicating whether the patient is deceased (true/false).Keep as-is- path: Patient.deceased.ofType(boolean)
method: keep
Deceased (date)PatientPatient.deceased.ofType(dateTime)IDAT – removal recommended per DSC; subject to further discussionDate and time of death. Could potentially be generalized to month precision analogous to date of birth — open for discussion. Redacted for now.Redact- path: Patient.deceased.ofType(dateTime)
method: redact
AddressPatientPatient.addressIDAT – removeFull address information in any form (home, work, temp, etc.).Redact all Address nodes- path: nodesByType('Address')
method: redact
Postal CodePatientPatient.address.postalCodeIDAT and MDAT – generalize to 2 digitsPostal code component of an address. Retaining the first 2 digits preserves regional granularity while reducing re-identification risk.Generalize to first 2 characters- path: Patient.address.postalCode
method: generalize
cases:
"$this": "$this.toString().substring(0,2)"
Free TextAllnodesByType('Annotation')IDAT – removeUnstructured free-text fields such as Observation.note. May contain patient-identifiable information and cannot be reliably de-identified automatically.Redact- path: nodesByType('Annotation')
method: redact

Using and Customizing the DUP YAML

The DUP base YAML file included in the repository is a starting point — not a final configuration. Each site or project needs to adapt it to meet their specific requirements.

Setup

The DIMP configuration must be mounted into the fhir-pseudonymizer container at startup. See this example for how to do this. After changing the mounted file, restart the fhir-pseudonymizer for the changes to take effect.


Re-Pseudonymization in the CDS

Patients in the CDS may have multiple identifiers, each of which may need to be re-pseudonymized differently depending on your project's requirements. For each identifier type, your site should create a dedicated pseudonym namespace.

There are three ways to handle each identifier:

OptionWhen to use
Don't re-pseudonymizeThe identifier is already pseudonymized and no additional per-project pseudonymization is needed
Re-pseudonymize for extraction (shared namespace across projects)The identifier is not yet pseudonymized, or your site requires pseudonymization for data extractions generally
Re-pseudonymize per DUP project (separate namespace per project)The identifier is not yet pseudonymized, or your site requires a distinct pseudonym for each individual DUP project

Identifier Reference Table

The CDS defines the following standard patient identifiers. Check which ones your site actually uses:

Profile field (slice)DIMP FHIR path
Patient.identifier:pidnodesByType('Identifier').where(type.coding.where(system='http://terminology.hl7.org/CodeSystem/v2-0203' and code='MR').exists()).value
Patient.identifier:PseudonymisierterIdentifiernodesByType('Identifier').where(type.coding.where(system='http://terminology.hl7.org/CodeSystem/v3-ObservationValue' and code='PSEUDED').exists()).value
Patient.identifier:AnonymisierterIdentifiernodesByType('Identifier').where(type.coding.where(system='http://terminology.hl7.org/CodeSystem/v3-ObservationValue' and code='ANONYED').exists()).value

Always redacted: Patient.identifier:versichertenId and Patient.identifier:MaskierterVersichertenIdentifier are always removed using the FHIR path: nodesByType('Identifier').where(type.coding.where(system='http://fhir.de/CodeSystem/identifier-type-de-basis' and (code='GKV' or code='PKV' or code='KVZ10')).exists())

Site-specific identifiers: Any additional identifiers your site has added that are not defined as a slice in the CDS profile must be removed during the DIMP process. This is your site's responsibility.

Configuration Checklist

  1. Identify which patient identifiers your site uses
  2. Update your DUP YAML to reflect the correct re-pseudonymization approach for each
  3. If using per-project namespaces, create those namespaces in your pseudonymization service (e.g. vfps, gPas, Enticy) before running the DIMP step — if a namespace is missing, the fhir-pseudonymizer will fail and break the pipeline
  4. For instructions on creating namespaces in vfps, see this guide

Working with DIMPED data and re-identification

Once data is DIMPed for a DUP the data set does not contain any original technical IDs or identifier anymore. Therefore additional steps are required for debugging and checking correct data extraction (like consent compliance).

INFO

The technical id - ID - is a technical identifier used in the FHIR server to identify a data entry and has no direct correspondance to the primary data in the hospital, this ID does not contain sensitive information and is commonly generated on load into the FHIR server (It is each sites responsibility to assess wether cryptohashing their technical IDs is sufficient). This ID should not be confused with a logical Identifier for the patient like the medical record number (MR). Identifier can be used to re-identify a patient in the hospital. They have to be added to DUP data sets for re-identification purposes, for example in case of withdrawal (German = "Widerruf").

Given any data set in DIMPed fhir or CSV format (aether job step folders dimp and csv), the technical IDs cannot be reversed, however if you are looking for a particular ID in your final data set from your original you can use the following command:

bash
echo -n "<ORIGINAL_ID>" | openssl dgst -sha256 -hmac "<YOUR_KEY>"

The key is configured as part of the pseudonymizer via the env variable Anonymization__CryptoHashKey.

For the re-pseudonymized identifier you will have to use your specific pseudonymisation service to re-identify an identifier.

For vfps this is the following call:

curl
curl --request GET \
  --url http://localhost:8089/v1/namespaces/my-namespace/pseudonyms/my-identifier \
  --header 'content-type: application/json'

e.g. 

curl --request GET \
  --url http://localhost:8089/v1/namespaces/my-dic-patient-namespace/pseudonyms/stringmlBC83Vba42cr4r8TkNMf65UNP9b3LNAIxfo0zKzk2NQp1IjT-a7ywstring \
  --header 'content-type: application/json'