A workflow engine is defined as part of a context. A context is currently limited to one workflow engine. The workflow engine will manage the execution of any workflows submitted by Amazon Genomics CLI. When the context is deployed, an endpoint will be made available to Amazon Genomics CLI through which it will submit workflows and workflow commands to the engine according to the WES API specification.
Supported Engines and Workflow Languages
Currently, Amazon Genomics CLI’s officially supported engines can be used to run the following workflows:
|Engine||Language||Language Versions||Run Mode|
|Cromwell||WDL||All versions up to 1.0||Server|
|Nextflow||Nextflow DSL||Standard and DSL 2||Head Process|
|miniwdl||WDL||documented here||Head Process|
|Snakemake||Snakemake||All versions||Head Process|
|Toil||CWL||All versions up to 1.2||Server|
Overtime we plan to add additional engine and language support and provide the ability for third party developers to develop engine plugins.
In server mode the engine runs as a long-running process that exists for the lifetime of the context. All workflow instances sent to the context are handled by that server. The server resides on on-demand instances to prevent Spot interruption even if the workflow tasks are run on Spot instances
Head process engines are run when a workflow is submitted, manage a single workflow and only run for the lifetime of the workflow. If multiple workflows are submitted to a context in parallel then multiple head processes are spawned. The head processes always run on on-demand resources to prevent Spot interruption even if the workflow tasks are run on Spot instances.
An engine is defined within a
context definition of the project YAML file file as a map. For example, the following snippet
defines a WDL engine of type
cromwell as part of the context named
onDemandCtx. There may be one engine defined
for each supported language.
contexts: onDemandCtx: requestSpotInstances: false engines: - type: wdl engine: cromwell
The costs associated with an engine depend on the actual infrastructure required by the engine. In the case of the Cromwell, the engine runs in “server” mode as an AWS ECS Fargate container using an Amazon Elastic File System for metadata storage. The container will be running for the entire time the context is deployed, even when no workflows are running. To avoid this cost we recommend destroying the context when it is not needed. The Nextflow engine runs as a single batch job per workflow instance and is only running when workflows are running.
In both cases a serverless WES API endpoint is deployed through Amazon API Gateway to act as the interface between Amazon Genomics CLI and the engine.
Being part of a context, engine related infrastructure is tagged with the context name, username and project name. These tags may be used to help differentiate costs.
Supported engines are currently deployed with configurations that allow them to make use of files in S3 and submit workflows as jobs to AWS Batch. Because the current generation of engines we support do not directly support the WES API, adapters are deployed as Fargate container tasks. AWS API Gateway is used to provide a gateway between Amazon Genomics CLI and the WES adapters.
workflow commands are issued on Amazon Genomics CLI, it will send WES API calls to the appropriate endpoint. The adapter mapped
to that endpoint will then translate the WES command and either send the command to the engines REST API for Cromwell, or
spawn a Nextflow engine task and submit the workflow with that task. At this point the engine is responsible for creating
controlling and destroying the resources that will be used for task execution.