Skip to content

Using Deep Learning Containers

The following sections describe how to use Deep Learning Containers to run sample code from each of the frameworks on AWS infrastructure.

Use Cases

Running on Amazon SageMaker AI

Using SageMaker Python SDK

Deploy an SGLang inference endpoint:

from sagemaker.model import Model

model = Model(
    image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/sglang:0.5.8-gpu-py312-cu129-ubuntu24.04-sagemaker",
    role="arn:aws:iam::<account_id>:role/<role_name>",
    env={
        "SM_SGLANG_MODEL_PATH": "meta-llama/Llama-3.1-8B-Instruct",
        "HF_TOKEN": "<your_hf_token>",
    },
)

predictor = model.deploy(
    instance_type="ml.g5.2xlarge",
    initial_instance_count=1,
)

Deploy a vLLM inference endpoint:

from sagemaker.model import Model

model = Model(
    image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.14.0-gpu-py312-cu129-ubuntu22.04-sagemaker",
    role="arn:aws:iam::<account_id>:role/<role_name>",
    env={
        "SM_VLLM_MODEL": "meta-llama/Llama-3.1-8B-Instruct",
        "HF_TOKEN": "<your_hf_token>",
    },
)

predictor = model.deploy(
    instance_type="ml.g5.2xlarge",
    initial_instance_count=1,
)

Using Boto3

Deploy an SGLang inference endpoint:

import boto3

sagemaker = boto3.client("sagemaker")

sagemaker.create_model(
    ModelName="sglang-model",
    PrimaryContainer={
        "Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/sglang:0.5.8-gpu-py312-cu129-ubuntu24.04-sagemaker",
        "Environment": {
            "SM_SGLANG_MODEL_PATH": "meta-llama/Llama-3.1-8B-Instruct",
            "HF_TOKEN": "<your_hf_token>",
        },
    },
    ExecutionRoleArn="arn:aws:iam::<account_id>:role/<role_name>",
)

sagemaker.create_endpoint_config(
    EndpointConfigName="sglang-endpoint-config",
    ProductionVariants=[
        {
            "VariantName": "default",
            "ModelName": "sglang-model",
            "InstanceType": "ml.g5.2xlarge",
            "InitialInstanceCount": 1,
            "InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-3-1",
        }
    ],
)

sagemaker.create_endpoint(
    EndpointName="sglang-endpoint",
    EndpointConfigName="sglang-endpoint-config",
)

Deploy a vLLM inference endpoint:

import boto3

sagemaker = boto3.client("sagemaker")

sagemaker.create_model(
    ModelName="vllm-model",
    PrimaryContainer={
        "Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.14.0-gpu-py312-cu129-ubuntu22.04-sagemaker",
        "Environment": {
            "SM_VLLM_MODEL": "meta-llama/Llama-3.1-8B-Instruct",
            "HF_TOKEN": "<your_hf_token>",
        },
    },
    ExecutionRoleArn="arn:aws:iam::<account_id>:role/<role_name>",
)

sagemaker.create_endpoint_config(
    EndpointConfigName="vllm-endpoint-config",
    ProductionVariants=[
        {
            "VariantName": "default",
            "ModelName": "vllm-model",
            "InstanceType": "ml.g5.2xlarge",
            "InitialInstanceCount": 1,
            "InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-3-1",
        }
    ],
)

sagemaker.create_endpoint(
    EndpointName="vllm-endpoint",
    EndpointConfigName="vllm-endpoint-config",
)

Running on Amazon EC2

Running PyTorch Training Container on an EC2 Instance

# Run interactively
docker run -it --gpus all <account_id>.dkr.ecr.<region>.amazonaws.com/<repository>:<tag> bash

# Example: Run PyTorch container
docker run -it --gpus all 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.9.0-cpu-py312-ubuntu22.04-ec2 bash

# Mount local directories to persist data
docker run -it --gpus all -v /local/data:/data 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.9.0-cpu-py312-ubuntu22.04-ec2 bash