Toil
Description
Toil is a workflow engine developed by the Computational Genomics Lab at the UC Santa Cruz Genomics Institute. In Amazon Genomics CLI, Toil is an engine that can be deployed in a context as an engine to run workflows written in the Common Workflow Language (CWL) standard, version v1.0, v1.1, and v1.2 (or mixed versions).
Toil is an open source project distributed by UC Santa Cruz under the Apache 2 license and available on GitHub.
Architecture
There are two components of a Toil engine as deployed in an Amazon Genomics CLI context:
Toil Server
The Toil engine is run in “server mode” as a container service in ECS. The engine can run multiple workflows asynchronously. Workflow tasks are run in an elastic compute environment and monitored by Toil. Amazon Genomics CLI communicates with the Toil engine via a GA4GH WES REST service which the server offers, available via API Gateway.
Task Compute Environment
Workflow tasks are submitted by Toil to an AWS Batch queue and run in
Toil-provided containers using an AWS Compute Environment. Tasks which use the
CWL DockerRequirement
will additionally be run in sibling containers on the host Docker daemon. AWS
Batch coordinates the elastic provisioning of EC2 instances (container hosts)
based on the available work in the queue. Batch will place containers on
container hosts as space allows.
Disk Expansion
Container hosts in the Batch compute environment use EBS volumes as local scratch space. As an EBS volume approaches a capacity threshold, new EBS volumes will be attached and merged into the file system. These volumes are destroyed when AWS Batch terminates the container host. CWL disk space requirements are ignored by Toil when running against AWS Batch.
This setup means that workflows that succeed on AGC may fail on other CWL runners (because they do not request enough disk space) and workflows that succeed on other CWL runners may fail on AGC (because they allocate disk space faster than the expansion process can react).