aether DUP pipeline coordinator
The DUP pipeline requires a Data Node to be set up with all the services required by the pipeline.
Please refer to the architecture of a data node here and a list of all the data node services here
Installing and using aether as part of the Data Node
To install aether follow the installation instruction here
For version compatibility with the dataportal see
Setting up aether
To use aether all the services which aether is using for your pipeline need to be set up - see pipeline steps above.
Additionally, to flatten data aether needs a flatteningLookup.json file, which can be downloaded for the the newest ontology published for the data node.
To download flatteningLookup of a particular ontology version, use the get_flattening_lookup.sh of this repository - see here.
Using aether
Aether uses a .yml config file which allows you to configure which steps should be included in your DUP pipeline.
Example call for aether in this context once installed - calling from the example directory aether pipeline --config base-pipeline-config.yml start queries/example-crtdl.json
It creates a job directory, which for each DUP project saves the output of each step, so that one can branch of or review the output from each step.
To see all the configuration options see base-pipeline-config.yml.
The DUP Reference Pipeline Detailed
Zooming in the more detailed pipeline can be depicted as follows:
aether use - General
See aether documentation here.
For an example configuration see the base configuration in our example setup here.
aether simple example to get started
First install aether locally following the install instructions here.
To get started using aether configure a simple pipeline as shown here.
and then run aether using aether pipeline --config base-pipeline-config-simple.yml start queries/example-crtdl.json in the aether folder of your data node.
Aether will run and then tell you the ID of your job e.g. Job ID: 20260331_0915_5932b1e1-0ed5-4bab-902e-25f328209390, which directly corresponds to a folder in your jobs directory.
For this simple example you will find your extracted data in the import folder in the directory of your specific job.
Note that aether always creates all necessary folders for all supported steps:
- import (TORCH export directory)
- pseudonymized (Pseudonymized data)
- validation (output information from the validation step - note this does not contain the data but validation results instead)
- csv (flattened output if csv is chosen)
- send (information about the send step)