Skip to content

INFO

This English text is a translation of the original German document, Pseudonymisierung, which was written to coordinate with our Data Protection Officer. It is provided for convenience; in case of discrepancies, the German version takes precedence.

Pseudonymization

In pseudonymization, the original IDs (oID) from the clinical domain (CD) are replaced with pseudonyms (sID) in the research domain (RD).
The process is designed with the help of a Trusted Center (TC) so that

  • the CD has no knowledge of the sIDs,
  • the RD has no knowledge of the oIDs,
  • and the TC itself has no access to the medical data content (see [1]).

Requirements

  1. The ID exchange process is handled via a Trusted Center Agent (TCA).
  2. Re-identification must be possible via the Trusted Center.
  3. The sIDs must remain consistent across repeated transmissions.

Transmission Process

Data transmission is managed by one agent each in the clinical domain (CDA), the research domain ( RDA), and the trusted center (TCA).

In a transmission process, the CDA sends a list of oIDs to be pseudonymized to the TCA.
The patient ID (PID) is handled separately, as it is used for re-identification.

The TCA generates a pseudonym for the oPID.
For the oIDs of other resources, a hash is calculated using a salt.
The TCA generates the salt, which is used in the hash function to protect the resulting sIDs from brute-force attacks.

Next, the TCA generates a transport ID (tID) for each oID and sends the transport mapping (tMap: oID ➙ tID) back to the CDA.
A secure mapping (sMap: tID ➙ sID) is created for the research domain.

The identifier for the sMap (sMapName) is sent to the CDA.
The CDA first replaces the oIDs with tIDs in the patient bundle, then sends the transport-pseudonymized patient bundle and sMapName to the RDA.
Upon receipt, the RDA requests the corresponding sMap and replaces the tIDs with sIDs.

The following diagram illustrates the transmission process in detail:

Generation of Transport and Pseudonymized IDs

sID

The TCA uses gPAS to generate and store pseudonyms. For each patient, two pseudonyms are generated:

oPIDsPID"Salt_"+oPIDSalt

The keys used are the oPID of the patient and the concatenation of the literal "Salt_" with the oPID. Note: "Salt_" is a fixed string, not a variable or real salt.

The first pseudonym maps the patient oPID directly to a sPID and can be used for re-identification. The second pseudonym acts as a salt for the other resource IDs:

Ressourcen-sID=SHA256(Salt+oID)

Security Note

The combination of alphabet size A and salt length n—i.e., An possible variants—must be chosen large enough to resist brute-force attacks (see security aspects).

Example

Suppose there is a patient in the CDA with two resources:

Patient:
  oID = 1,
  Ressourcen:
  [
    Encounter: oID = 2,
    MedicationAdministration: oID = 3
  ]

The CDA sends the oIDs (1, 2, 3) to the TCA. The TCA generates:

1d7dsjdg4Salt_15kf8344f

Using the salt, the TCA computes:

2SHA256(5kf8344f2)3SHA256(5kf8344f3)

tID

For each oID, a random number is generated as the tID.

The mapping:

oIDtID

is temporarily stored in a key-value store. Thus, tIDs can vary on repeated transfers. The retention time of tIDs is configurable in the TCA.

Transport Mapping: Replacing oIDs with tIDs

Once the CDA sends the oIDs to be pseudonymized to the TCA, temporary transport IDs (tIDs) are generated and sent back with the sMapName. The CDA replaces the oIDs with tIDs and sends the data and sMapName to the RDA.

Example Transport Mapping:

184613221234186571397354168
transport-Patient:
  tID = 84613221,
  Ressourcen:
  [
    Encounter: tID = 34186571,
    MedicationAdministration: tID = 97354168
  ]

Secure Mapping

After receiving the transport-pseudonymized bundle, the RDA requests the sMap using the sMapName and replaces the tIDs with sIDs. The sIDs are intended for research purposes and remain stable across transfers.

Example Secure Mapping:

84613221d7dsjdg434186571SHA256(5kf8344f2)97354168SHA256(5kf8344f3)
research-Patient:
 <sID = d7dsjdg4,
 Ressourcen: [
  Encounter: sID = SHA256(5kf8344f2),
  MedicationAdministration: sID = SHA256(5kf8344f3)
 ]

Security Aspects

Salt Brute-Forcing

Suppose an attacker knows both oIDs and sIDs and attempts to determine the mapping using brute-force.

The time T needed to determine the salt is given by:

T=Anv

where A is the alphabet size, n the salt length, and v the number of hashes per second.

As of 2025, hardware capable of 1015 SHA256 hashes per second is realistically available for $25,000, consuming 15 J/TH. These figures are based on SHA256 Bitcoin mining hardware.

Alphabet Size (A)Length (n)Possible Combinations (A)Time @ 1015 Hashes/secPower Use (15 J/TH) in kWh
26 (lowercase)1226129,5101695 s4,0101
26 (lowercase)1626164,410221,4 years1,8105
26 (lowercase)2426249,11033177 years3,81016
62 (alphanumeric)1262123,2102138 days1,3104
62 (alphanumeric)1662164,8102815 years2,01011
62 (alphanumeric)2462241,0104331020 years4,31025

References