DIMP
DIMP (De-identify, Minimize, Pseudonymize) provides de-identification, minimization, and pseudonymization for FHIR data, protecting patient privacy while keeping the data useful for research.
What DIMP Does
- Removes or masks identifying information (names, addresses, etc.)
- Generates consistent pseudonyms for patient identifiers
- Preserves clinical data (diagnoses, procedures, lab values)
Configuration
Add DIMP to your aether.yaml:
services:
dimp:
url: "http://your-dimp-server:32861/fhir"
pipeline:
enabled_steps:
- torch # or local_import
- dimp # Pseudonymize after import
jobs_dir: "./jobs"Running Pseudonymization
aether pipeline start your-query.crtdlAether will:
- Extract data from TORCH (or import from files)
- Send it to DIMP for dimping
- Save the protected data in the jobs folder
Output
Results are saved in:
jobs/<job-id>/
├── status.json # Job status
└── dimp_results.ndjson # Pseudonymized dataCRTDL Preprocessing
DIMP requires certain attributes (like Patient.identifier) to be present in the extracted FHIR data. If your CRTDL query doesn't include these attributes, DIMP pseudonymization will fail.
CRTDL preprocessing automatically enriches your CRTDL with the required attributes before sending it to TORCH:
services:
crtdl_preprocessing:
enabled: true
enrichments:
- group_reference: "https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert"
create_if_not_exists:
group_name: "PatientPseudonymisiert"
attributes_to_add:
- attribute_ref: "Patient.identifier"
must_have: false
- group_reference: "https://www.medizininformatik-initiative.de/fhir/core/modul-fall/StructureDefinition/KontaktGesundheitseinrichtung"
attributes_to_add:
- attribute_ref: "Encounter.identifier"
must_have: falseThe create_if_not_exists option creates the group in the CRTDL if it doesn't already exist. This is useful for groups like PatientPseudonymisiert that may not be part of the original research query but are needed by DIMP.
Enrichment rules can also be loaded from an external JSON file. See CRTDL Preprocessing in the configuration reference for details.
Large Bundles
For large datasets, Aether automatically splits bundles before sending to DIMP:
services:
dimp:
url: "http://your-dimp-server:32861/fhir"
bundle_split_threshold_mb: 10 # Split bundles larger than 10MB