This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

What does your user need to know to try your project?

The following links will help you install Amazon Genomics CLI and quickly run a demo workflow.

1 - Prerequisites

To run Amazon Genomics CLI the following prerequisites must be met:

  • A computer with one of the following operating systems:
    • macOS 10.14+
    • Amazon Linux 2
    • Ubuntu 20.04
    • Windows 10 with a Windows subsystem running Ubuntu which runs the commands
  • Internet access
  • An AWS Account
  • An AWS role with sufficient access. To generate the minimum required policies for admins and users, please follow the instructions here

Running Amazon Genomics CLI on Windows has not been tested, but it should run in WSL 2 with Ubuntu 20.04

Prerequisite installation

Ubuntu 20.04

  • Install node.js
curl -fsSL https://deb.nodesource.com/setup_15.x | sudo -E bash -
sudo apt-get install -y nodejs
  • Install and configure AWS CLI
sudo apt install awscli
aws configure
# ... set access key ID, secret access key, and region

Amazon Linux 2 (e.g. on an EC2 instance)

  • Install node
curl -sL https://rpm.nodesource.com/setup_16.x | sudo -E bash -
sudo yum install -y nodejs
  • If you have not already done so, configure your AWS credentials and default region
aws configure

MacOS

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • Install node
brew install node
  • Install and configure AWS CLI
brew install awscli
aws configure
# ... set access key ID, secret access key, and region

2 - Installation

Download and install Amazon Genomics CLI

Download the Amazon Genomics CLI zip, unzip its contents, and run the install.sh script:

To download a specific release, see releases page of our Github repo.

To download the latest release navigate to https://github.com/aws/amazon-genomics-cli/releases/

Once you have downloaded a release, type the following to install:

The latest nightly build can be accessed here: s3://healthai-public-assets-us-east-1/amazon-genomics-cli/nightly-build/amazon-genomics-cli.zip

You can download the nightly by running the following:

aws s3api get-object --bucket healthai-public-assets-us-east-1 --key amazon-genomics-cli/nightly-build/amazon-genomics-cli.zip amazon-genomics-cli.zip
unzip amazon-genomics-cli-<version>.zip
cd amazon-genomics-cli/ 
./install.sh

This will place the agc command in $HOME/bin.

The Amazon Genomics CLI is a statically compiled Go binary. It should run in your environment natively without any additional setup. Test the CLI with:

$ agc --help

🧬 Launch and manage genomics workloads on AWS.

Commands
  Getting Started 🌱
    account     Commands for AWS account setup.
                Install or remove AGC from your account.

  Contexts
    context     Commands for contexts.
                Contexts specify workflow engines and computational fleets to use when running a workflow.

  Logs
    logs        Commands for various logs.

  Projects
    project     Commands to interact with projects.

  Workflows
    workflow    Commands for workflows.
                Workflows are potentially-dynamic graphs of computational tasks to execute.

  Settings ⚙️
    configure   Commands for configuration.
                Configuration is stored per user.

Flags
      --format string   Format option for output. Valid options are: text, table, json (default "text")
  -h, --help            help for agc
      --silent          Suppresses all diagnostic information.
  -v, --verbose         Display verbose diagnostic information.
      --version         version for agc
Examples
  Displays the help menu for the specified sub-command.
  `$ agc account --help`

If this doesn’t work immediately, try:

  • start a new terminal shell
  • modifying your $HOME/.bashrc (or equivalent file) appending the following line and restarting your shell:
export PATH=$HOME/bin:$PATH

If you are running this on MacOS, you may see this below popup window when you initially run any agc commands due to Apple’s security restrictions.

image of cannot open popup

Click Cancel and navigate to Apple’s System Preferences, click Security & Privacy, then click General. Near the bottom, you will see a line indicating "agc" was blocked from use because it is not from an identified developer. To the right, click Allow Anyway.

Now go back to the terminal and run agc --help again. You will see this new popup window below asking you to override the system security.

image of cannot verify developer popup

Click Open and now your agc is correctly installed.

Verify that you have the latest version of Amazon Genomics CLI with:

agc --version

If you do not, you may need to uninstall any previous versions of Amazon Genomics CLI and reinstall the latest.

Command Completion

Amazon Genomics CLI can generate shell completion scripts that enable ‘Tab’ completion of commands. Command completion is optional and not required to use Amazon Genomics CLI. To generate a completion script you can use:

 agc completion <shell>

where “shell” is one of:

Bash

source <(agc completion bash)

To load completions for each session, execute once:

Linux:

agc completion bash > /etc/bash_completion.d/agc

macOS:

If you haven’t already installed bash-completion, execute the following once

brew install bash-completion

and then, add the following line to your ~/.bash_profile:

[[ -r "/usr/local/etc/profile.d/bash_completion.sh" ]] && . "/usr/local/etc/profile.d/bash_completion.sh"

Once bash completion is installed

agc completion bash > /usr/local/etc/bash_completion.d/agc

Zsh:

If shell completion is not already enabled in your environment, you will need to enable it. You can execute the following once:

echo "autoload -U compinit; compinit" >> ~/.zshrc

To load completions for each session, execute once:

agc completion zsh > "${fpath[1]}/_agc"

You will need to start a new shell for this setup to take effect.

fish:

agc completion fish | source

To load completions for each session, execute once:

agc completion fish > ~/.config/fish/completions/agc.fish

PowerShell:

agc completion powershell | Out-String | Invoke-Expression

To load completions for every new session, run:

agc completion powershell > agc.ps1

and source this file from your PowerShell profile.

3 - Setup

Account activation

To start using Amazon Genomics CLI with your AWS account, you need to activate it.

agc account activate

This will create the core infrastructure that Amazon Genomics CLI needs to operate, which includes a DynamoDB table, an S3 bucket and a VPC. This will take ~5min to complete. You only need to do this once per account region.

The DynamoDB table is used by the CLI for persistent state. The S3 bucket is used for durable workflow data and Amazon Genomics CLI metadata and the VPC is used to isolate compute resources. You can specify your own preexisting S3 Bucket or VPC if needed using --bucket and --vpc options.

CDK Bootstrap

Amazon Genomics CLI uses AWS CDK to deploy infrastructure. Activating an account will bootstrap the AWS Environment for CDK app deployments. CDK Bootstrap deploys the infrastructure needed to allow CDK to deploy CDK defined infrastructure. Full details are available here.

Define a username

Amazon Genomics CLI requires that you define a username and email. You can do this using the following command:

agc configure email you@youremail.com

The username only needs to be configured once per computer that you use Amazon Genomics CLI from.

4 - Hello world

When you install Amazon Genomics CLI it will create a folder named agc. Inside there is an examples/demo-project folder containing an agc-project.yaml with some demo projects including a simple “hello world” workflow.

Running Hello World

  1. Ensure you have met the prerequisites and installed Amazon Genomics CLI
  2. Ensure you have followed the activation steps
  3. cd ~/amazon-genomics-cli/examples/demo-wdl-project
  4. agc context deploy --context myContext, this step takes approximately 5 minutes to deploy the infrastructure
  5. agc workflow run hello --context myContext, take note of the returned workflow instance ID.
  6. Check on the status of the workflow agc workflow status -r <workflow-instance-id>. Initially you will see status like SUBMITTED but after the elastic compute resources have been spun up and the workflow runs you should see something like the following: WORKFLOWINSTANCE myContext 9ff7600a-6d6e-4bda-9ab6-c615f5d90734 COMPLETE 2021-09-01T20:17:49Z

Congratulations! You have just run your first workflow in the cloud using Amazon Genomics CLI! At this point you can run additional workflows, including submitting several instances of the “hello world” workflow. The elastic compute resources will expand and contract as necessary to accommodate the backlog of submitted workflows.

Reviewing the Results

Workflow results are written to an S3 bucket specified or created by Amazon Genomics CLI during account activation. You can list or retrieve the S3 URI for the bucket with:

AGC_BUCKET=$(aws ssm get-parameter \
    --name /agc/_common/bucket \
    --query 'Parameter.Value' \
    --output text)

and then use aws s3 commands to explore and retrieve data from the bucket. Workflow output will be in the s3://agc-<account-num>-<region>/project/<project-name>/userid/<user-id>/context/<context-name>/workflow/<workflow-name>/ path. The rest of the path depends on the engine used to run the workflow. For Cromwell it will continue with: .../cromwell-execution/<wdl-wf-name>/<workflow-run-id>/<task-name>

If a workflow declares outputs then you may obtain these using the command:

agc workflow output <workflow_run_id>

You should see a response similar to:

OUTPUT	id	6cc6f742-dc87-4649-b319-1af45c4c09c6
OUTPUT	outputs.hello_agc.hello.out	Hello Amazon Genomics CLI!

You can also obtain task logs for a workflow using the following form agc logs workflow <workflow-name> -r <instance-id>.

Note, if the workflow did not actually run any tasks due to call caching then there will be no output from this command.

Cleaning Up

Once you are done with myContext you can destroy it with:

agc context destroy myContext

This will remove the cloud resources associated with the named context, but will keep any S3 outputs and CloudWatch logs.

If you want stop using Amazon Genomics CLI in your AWS account entirely, you need to deactivate it:

agc account deactivate

This will remove Amazon Genomics CLI’s core infrastructure. If Amazon Genomics CLI created a VPC as part of the activate process, it will be removed. If Amazon Genomics CLI created an S3 bucket for you, it will be retained.

To uninstall Amazon Genomics CLI from your local machine, run the following command:

./agc/uninstall.sh

Note uninstalling the CLI will not remove any resources or persistent data from your AWS account.

Next Steps