Using Deep Learning Containers¶
The following sections describe how to use Deep Learning Containers to run sample code from each of the frameworks on AWS infrastructure.
Use Cases¶
-
For information on using Deep Learning Containers with Amazon SageMaker AI, see the Use Your Own Algorithms or Models with Amazon SageMaker AI Documentation.
-
To learn about using Deep Learning Containers with Amazon SageMaker AI HyperPod on Amazon EKS, see Orchestrating SageMaker HyperPod clusters with Amazon EKS and Amazon SageMaker AI.
Running on Amazon SageMaker AI¶
Using SageMaker Python SDK¶
Deploy an SGLang inference endpoint:¶
from sagemaker.model import Model
model = Model(
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/sglang:0.5.8-gpu-py312-cu129-ubuntu24.04-sagemaker",
role="arn:aws:iam::<account_id>:role/<role_name>",
env={
"SM_SGLANG_MODEL_PATH": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
)
predictor = model.deploy(
instance_type="ml.g5.2xlarge",
initial_instance_count=1,
)
Deploy a vLLM inference endpoint:¶
from sagemaker.model import Model
model = Model(
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.14.0-gpu-py312-cu129-ubuntu22.04-sagemaker",
role="arn:aws:iam::<account_id>:role/<role_name>",
env={
"SM_VLLM_MODEL": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
)
predictor = model.deploy(
instance_type="ml.g5.2xlarge",
initial_instance_count=1,
)
Using Boto3¶
Deploy an SGLang inference endpoint:¶
import boto3
sagemaker = boto3.client("sagemaker")
sagemaker.create_model(
ModelName="sglang-model",
PrimaryContainer={
"Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/sglang:0.5.8-gpu-py312-cu129-ubuntu24.04-sagemaker",
"Environment": {
"SM_SGLANG_MODEL_PATH": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
},
ExecutionRoleArn="arn:aws:iam::<account_id>:role/<role_name>",
)
sagemaker.create_endpoint_config(
EndpointConfigName="sglang-endpoint-config",
ProductionVariants=[
{
"VariantName": "default",
"ModelName": "sglang-model",
"InstanceType": "ml.g5.2xlarge",
"InitialInstanceCount": 1,
"InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-3-1",
}
],
)
sagemaker.create_endpoint(
EndpointName="sglang-endpoint",
EndpointConfigName="sglang-endpoint-config",
)
Deploy a vLLM inference endpoint:¶
import boto3
sagemaker = boto3.client("sagemaker")
sagemaker.create_model(
ModelName="vllm-model",
PrimaryContainer={
"Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.14.0-gpu-py312-cu129-ubuntu22.04-sagemaker",
"Environment": {
"SM_VLLM_MODEL": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
},
ExecutionRoleArn="arn:aws:iam::<account_id>:role/<role_name>",
)
sagemaker.create_endpoint_config(
EndpointConfigName="vllm-endpoint-config",
ProductionVariants=[
{
"VariantName": "default",
"ModelName": "vllm-model",
"InstanceType": "ml.g5.2xlarge",
"InitialInstanceCount": 1,
"InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-3-1",
}
],
)
sagemaker.create_endpoint(
EndpointName="vllm-endpoint",
EndpointConfigName="vllm-endpoint-config",
)
Running on Amazon EC2¶
Running PyTorch Training Container on an EC2 Instance¶
# Run interactively
docker run -it --gpus all <account_id>.dkr.ecr.<region>.amazonaws.com/<repository>:<tag> bash
# Example: Run PyTorch container
docker run -it --gpus all 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.9.0-cpu-py312-ubuntu22.04-ec2 bash
# Mount local directories to persist data
docker run -it --gpus all -v /local/data:/data 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.9.0-cpu-py312-ubuntu22.04-ec2 bash
Quick Links¶
- Available Images - Browse all container images
- Support Policy - Framework versions and timelines