Configuration Reference
Complete reference for all Aether configuration options.
Configuration Schema
services:
torch:
base_url: string
username: string
password: string
extraction_timeout: duration # default: PT30M
polling_interval: duration # default: PT5S
max_polling_interval: duration # default: PT30S
dimp:
url: string
bundle_split_threshold_mb: integer # 1-100, default: 10
flattening:
service_url: string
lookup_path: string
formats: [string] # ["csv"]
timeout: duration # default: PT30M
batch_size_mb: integer # default: 500
send:
send_as: string # "direct_resource_load", "transfer_load", or "s3_upload"
url: string # required for FHIR modes; ignored for s3_upload
batch_size: integer # 0-1000, default: 100 (direct_resource_load only)
auth: # FHIR auth, or proxy auth in s3_upload mode
username: string
password: string
oauth_issuer_uri: string
oauth_client_id: string
oauth_client_secret: string
transfer: # transfer_load only
project_identifier: string
organization_identifier: string
s3: # s3_upload only
bucket: string # required
region: string # required
access_key_id: string # required
secret_access_key: string # required
endpoint: string # custom S3-compatible endpoint
use_path_style: boolean # default: false
timeout: duration # default: PT30M
validation:
url: string
max_concurrent_requests: integer # default: 4
bundle_chunk_size_mb: integer # default: 10
fail_on_error: boolean # default: true
local_import:
dir: string
crtdl_preprocessing:
enabled: boolean # default: false
enrichments_path: string # Path to external JSON file
enrichments: # Inline enrichment rules
- group_reference: string
create_if_not_exists: # Optional: create group if not in CRTDL
group_name: string
attributes_to_add:
- attribute_ref: string
must_have: boolean
linked_groups: [string] # Profile URLs, resolved to group IDs
pipeline:
enabled_steps: [string]
retry:
max_attempts: integer # 1-10, default: 5
initial_backoff_ms: integer # default: 1000
max_backoff_ms: integer # default: 30000
tls:
ca_cert_path: string # PEM bundle of additional trusted certs
insecure_skip_verify: boolean # default: false
compression:
enabled: boolean # default: true
level: string # fastest, default, better, best
jobs_dir: string # default: ./jobsServices
TORCH
TORCH server for FHIR data extraction.
services:
torch:
base_url: "https://torch.example.org"
username: "${TORCH_USER}"
password: "${TORCH_PASSWORD}"
extraction_timeout: PT30M
polling_interval: PT5S
max_polling_interval: PT30S| Option | Type | Default | Description |
|---|---|---|---|
base_url | string | - | TORCH server URL (required if torch step enabled) |
username | string | - | Authentication username |
password | string | - | Authentication password |
extraction_timeout | duration | PT30M | Max wait time for extraction. Also serves as the safety net for transient polling errors — polling retries until this timeout is exceeded. |
polling_interval | duration | PT5S | Initial status check interval |
max_polling_interval | duration | PT30S | Max interval (exponential backoff cap) |
file_ready_retries | int | 10 | Number of retries for file availability check |
file_ready_interval | duration | PT10S | Interval between file availability checks |
DIMP
DIMP pseudonymization service.
services:
dimp:
url: "http://dimp:32861"
bundle_split_threshold_mb: 10| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | DIMP server root URL (required if dimp step enabled). Do not include /fhir — the client appends it. |
bundle_split_threshold_mb | int | 10 | Split Bundles larger than this (1-100 MB) |
Flattening
fhir-flattener service for FHIR to CSV transformation.
services:
flattening:
service_url: "http://fhir-flattener:8000"
lookup_path: "/config/flatten-lookup.json"
formats:
- csv
timeout: PT30M| Option | Type | Default | Description |
|---|---|---|---|
service_url | string | - | fhir-flattener service URL |
lookup_path | string | - | Path to lookup table file |
formats | []string | ["csv"] | Output formats |
timeout | duration | 30m | Request timeout |
batch_size_mb | int | 500 | Total memory budget in MB, divided across attribute groups (0 = use default) |
Send
Destination server or object store for uploading processed data. Mode is selected via send_as.
Direct Resource Load
Upload FHIR resources directly to a FHIR server.
services:
send:
send_as: "direct_resource_load"
url: "https://fhir-server.example.com"
batch_size: 100
auth:
username: "${FHIR_USER}"
password: "${FHIR_PASSWORD}"Transfer Load
Package files for DSF-based transfer.
services:
send:
send_as: "transfer_load"
url: "https://transfer.example.com"
auth:
oauth_issuer_uri: "${OAUTH_ISSUER}"
oauth_client_id: "${OAUTH_CLIENT}"
oauth_client_secret: "${OAUTH_SECRET}"
transfer:
project_identifier: "MII-PROJECT"
organization_identifier: "your-org.example.de"S3 Upload
Upload files to an S3-compatible bucket (AWS S3, MinIO, Ceph).
services:
send:
send_as: "s3_upload"
s3:
bucket: "${S3_BUCKET}"
region: "eu-central-1"
access_key_id: "${AWS_ACCESS_KEY_ID}"
secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
# endpoint: "http://minio.example.com:9000"
# use_path_style: true
# timeout: PT30M| Option | Type | Default | Description |
|---|---|---|---|
send_as | string | - | direct_resource_load, transfer_load, or s3_upload |
url | string | - | FHIR server root URL — required for FHIR modes, ignored for s3_upload. Do not include /fhir; the client appends it. |
batch_size | int | 100 | Resources per transaction (direct_resource_load only, 0-1000) |
Authentication (choose one for FHIR modes):
| Option | Description |
|---|---|
auth.username + auth.password | Basic authentication |
auth.oauth_issuer_uri + oauth_client_id + oauth_client_secret | OAuth2 client credentials |
In s3_upload mode the auth block is optional and used only as upstream proxy authentication (basic auth via Proxy-Authorization); the S3 API itself is authenticated via s3.access_key_id / s3.secret_access_key.
Transfer settings (transfer_load mode only):
| Option | Description |
|---|---|
transfer.project_identifier | MII project identifier |
transfer.organization_identifier | Organization identifier |
S3 settings (s3_upload mode only):
| Option | Type | Default | Description |
|---|---|---|---|
s3.bucket | string | - | Target bucket name (required) |
s3.region | string | - | AWS region, e.g. eu-central-1 (required) |
s3.access_key_id | string | - | S3 access key (required) |
s3.secret_access_key | string | - | S3 secret key (required) |
s3.endpoint | string | - | Custom endpoint URL (MinIO, Ceph, etc.). Leave empty for AWS S3. |
s3.use_path_style | bool | false | Use path-style addressing (required for MinIO and many S3-compatible stores) |
s3.timeout | duration | PT30M | Per-request timeout |
Validation
FHIR validation service for data quality checks.
services:
validation:
url: "http://validator:8080/fhir"
max_concurrent_requests: 4
bundle_chunk_size_mb: 10
fail_on_error: true| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | Validation service URL (required if validation step enabled) |
max_concurrent_requests | int | 4 | Concurrent validation requests |
bundle_chunk_size_mb | int | 10 | Bundle chunk size for batching resources (MB) |
fail_on_error | bool | true | Stop pipeline when validation finds data quality errors |
When fail_on_error is true (default), the pipeline stops after the validation step completes with errors. When false, validation reports are written but the pipeline continues.
Local Import
Default directory for local FHIR imports.
services:
local_import:
dir: "/data/fhir"| Option | Type | Description |
|---|---|---|
dir | string | Default import directory (overridable with --dir flag) |
CRTDL Preprocessing
Enriches CRTDL documents with additional attributes before sending to TORCH. This is required when using DIMP pseudonymization, which needs certain identifier attributes (e.g., Patient.identifier) to be present in the CRTDL extraction query.
services:
crtdl_preprocessing:
enabled: true
enrichments:
- group_reference: "https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert"
create_if_not_exists:
group_name: "PatientPseudonymisiert"
attributes_to_add:
- attribute_ref: "Patient.identifier"
must_have: false| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable CRTDL preprocessing |
enrichments_path | string | - | Path to external JSON enrichment file |
enrichments | list | - | Inline enrichment rules (mutually exclusive with enrichments_path) |
Enrichment rule options:
| Option | Type | Description |
|---|---|---|
group_reference | string | Profile URL of the CRTDL attribute group to enrich (required) |
create_if_not_exists.group_name | string | If group is missing from CRTDL, create it with this name |
attributes_to_add[].attribute_ref | string | FHIR attribute reference to add (required) |
attributes_to_add[].must_have | bool | Whether the attribute is required for extraction |
attributes_to_add[].linked_groups | []string | Profile URLs to resolve to group IDs for cross-references |
External JSON file format:
When using enrichments_path, the file uses camelCase field names:
[
{
"groupReference": "https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert",
"createIfNotExists": {
"groupName": "PatientPseudonymisiert"
},
"attributesToAdd": [
{
"attributeRef": "Patient.identifier",
"mustHave": false
}
]
}
]A shorter syntax is also supported for group creation:
{
"groupReference": "https://example.org/fhir/StructureDefinition/Patient",
"addGroupIfNotExists": true,
"attributesToAdd": [
{"attributeRef": "Patient.identifier", "mustHave": false}
]
}When addGroupIfNotExists is true, the group name is automatically derived from the last segment of the profile URL (e.g., "Patient" from the URL above). Use createIfNotExists with an explicit groupName if you need a custom name.
Note:
addGroupIfNotExistsandcreateIfNotExistsare mutually exclusive. Unknown fields in the JSON file will produce an error.
Pipeline
pipeline:
enabled_steps:
- local_import
- dimp
- flattening| Option | Type | Default | Description |
|---|---|---|---|
enabled_steps | []string | - | Pipeline steps to execute in order |
Available steps:
| Step | Description |
|---|---|
torch | Import via TORCH (requires CRTDL) |
local_import | Import from local directory |
http_import | Import from HTTP URL |
dimp | Pseudonymize via DIMP |
wait | Pause for manual inspection |
flattening | Transform to CSV (requires CRTDL) |
send | Upload to destination server |
validation | Validate FHIR data against profiles |
Rules:
- One import step must be first (torch, local_import, or http_import)
- Wait step cannot be first or consecutive
- Flattening requires CRTDL input
Retry
retry:
max_attempts: 5
initial_backoff_ms: 1000
max_backoff_ms: 30000| Option | Type | Default | Range | Description |
|---|---|---|---|---|
max_attempts | int | 5 | 1-10 | Max retry attempts for transient errors |
initial_backoff_ms | int | 1000 | - | Initial backoff delay |
max_backoff_ms | int | 30000 | - | Max backoff delay |
Exponential backoff: wait = min(initial * 2^attempt, max)
TLS
Trust custom or internal certificates and, when needed, disable verification entirely. Applied to every outgoing HTTP client (TORCH, DIMP, validation, flattening, send, HTTP import).
tls:
ca_cert_path: "/path/to/certs.pem"
insecure_skip_verify: false| Option | Type | Default | Description |
|---|---|---|---|
ca_cert_path | string | - | PEM bundle of additional CA or server certificates to trust. System CAs remain trusted alongside these. Supports ${ENV} substitution. |
insecure_skip_verify | bool | false | Skip TLS verification entirely. Development/testing only. |
Compression
compression:
enabled: true
level: "default"| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable zstd compression |
level | string | "default" | Compression level |
Compression levels:
| Level | Speed | Ratio | Use Case |
|---|---|---|---|
fastest | ~500 MB/s | ~3-4x | Large datasets, CPU-constrained |
default | ~200 MB/s | ~4-5x | Balanced (recommended) |
better | ~100 MB/s | ~5-6x | Storage-constrained |
best | ~50 MB/s | ~6-7x | Archival |
Output files use .ndjson.zst extension when enabled. Aether auto-detects and reads both compressed and uncompressed files.
Jobs Directory
jobs_dir: "./jobs"Directory for job state and data files.
Environment Variables
All string values support environment variable substitution:
services:
torch:
username: "${TORCH_USERNAME}"
password: "${TORCH_PASSWORD}"
send:
url: "${FHIR_SERVER_URL}"Example Configurations
TORCH + DIMP
services:
torch:
base_url: "https://torch.hospital.org"
username: "${TORCH_USER}"
password: "${TORCH_PASS}"
dimp:
url: "http://dimp:32861"
pipeline:
enabled_steps:
- torch
- dimp
jobs_dir: "./jobs"Local Import with Flattening
services:
local_import:
dir: "/data/fhir"
dimp:
url: "http://dimp:32861"
flattening:
service_url: "http://fhir-flattener:8000"
lookup_path: "/config/lookup.json"
pipeline:
enabled_steps:
- local_import
- dimp
- flattening
compression:
enabled: true
level: "default"
jobs_dir: "./jobs"Full Pipeline with Send
services:
torch:
base_url: "https://torch.hospital.org"
username: "${TORCH_USER}"
password: "${TORCH_PASS}"
dimp:
url: "http://dimp:32861"
send:
send_as: "transfer_load"
url: "https://transfer.mii.de"
auth:
oauth_issuer_uri: "${OAUTH_ISSUER}"
oauth_client_id: "${OAUTH_CLIENT}"
oauth_client_secret: "${OAUTH_SECRET}"
transfer:
project_identifier: "MII-PROJECT"
organization_identifier: "hospital.example.de"
pipeline:
enabled_steps:
- torch
- dimp
- send
retry:
max_attempts: 5
compression:
enabled: true
jobs_dir: "/data/aether/jobs"Next Steps
- CLI Commands - Command reference
- Pipeline Steps - Step details