DIMP

DIMP (De-identify, Minimize, Pseudonymize) provides de-identification, minimization, and pseudonymization for FHIR data, protecting patient privacy while keeping the data useful for research.

What DIMP Does

Removes or masks identifying information (names, addresses, etc.)
Generates consistent pseudonyms for patient identifiers
Preserves clinical data (diagnoses, procedures, lab values)

Configuration

Add DIMP to your aether.yaml:

yaml

services:
  dimp:
    url: "http://your-dimp-server:32861/fhir"

pipeline:
  enabled_steps:
    - torch   # or local_import
    - dimp    # Pseudonymize after import

jobs_dir: "./jobs"

Running Pseudonymization

bash

aether pipeline start your-query.crtdl

Aether will:

Extract data from TORCH (or import from files)
Send it to DIMP for dimping
Save the protected data in the jobs folder

Output

Results are saved in:

jobs/<job-id>/
├── status.json          # Job status
└── dimp_results.ndjson  # Pseudonymized data

CRTDL Preprocessing

DIMP requires certain attributes (like Patient.identifier) to be present in the extracted FHIR data. If your CRTDL query doesn't include these attributes, DIMP pseudonymization will fail.

CRTDL preprocessing automatically enriches your CRTDL with the required attributes before sending it to TORCH:

yaml

services:
  crtdl_preprocessing:
    enabled: true
    enrichments:
      - group_reference: "https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert"
        create_if_not_exists:
          group_name: "PatientPseudonymisiert"
        attributes_to_add:
          - attribute_ref: "Patient.identifier"
            must_have: false
      - group_reference: "https://www.medizininformatik-initiative.de/fhir/core/modul-fall/StructureDefinition/KontaktGesundheitseinrichtung"
        attributes_to_add:
          - attribute_ref: "Encounter.identifier"
            must_have: false

The create_if_not_exists option creates the group in the CRTDL if it doesn't already exist. This is useful for groups like PatientPseudonymisiert that may not be part of the original research query but are needed by DIMP.

Enrichment rules can also be loaded from an external JSON file. See CRTDL Preprocessing in the configuration reference for details.

Large Bundles

For large datasets, Aether automatically splits bundles before sending to DIMP:

yaml

services:
  dimp:
    url: "http://your-dimp-server:32861/fhir"
    bundle_split_threshold_mb: 10   # Split bundles larger than 10MB

DIMP ​

What DIMP Does ​

Configuration ​

Running Pseudonymization ​

Output ​

CRTDL Preprocessing ​

Large Bundles ​

DIMP

What DIMP Does

Configuration

Running Pseudonymization

Output

CRTDL Preprocessing

Large Bundles