Using Deep Learning Containers¶
This page shows common deployment patterns across frameworks. For framework-specific deep dives, see the dedicated guides: vLLM, vLLM-Omni, Ray.
Additional Resources¶
- Use Your Own Algorithms or Models with Amazon SageMaker AI
- Orchestrating SageMaker HyperPod clusters with Amazon EKS and Amazon SageMaker AI
Running on Amazon SageMaker AI¶
Using SageMaker Python SDK¶
Deploy an SGLang inference endpoint:¶
from sagemaker.model import Model
model = Model(
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/sglang:0.5.12-gpu-py312-cu130-ubuntu24.04-sagemaker",
role="arn:aws:iam::<account_id>:role/<role_name>",
env={
"SM_SGLANG_MODEL_PATH": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
)
predictor = model.deploy(
instance_type="ml.g5.2xlarge",
initial_instance_count=1,
)
Deploy a vLLM inference endpoint:¶
from sagemaker.model import Model
model = Model(
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.21.0-gpu-py312-cu130-ubuntu22.04-sagemaker",
role="arn:aws:iam::<account_id>:role/<role_name>",
env={
"SM_VLLM_MODEL": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
)
predictor = model.deploy(
instance_type="ml.g5.2xlarge",
initial_instance_count=1,
)
Using Boto3¶
Deploy an SGLang inference endpoint:¶
import boto3
sagemaker = boto3.client("sagemaker")
sagemaker.create_model(
ModelName="sglang-model",
PrimaryContainer={
"Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/sglang:0.5.12-gpu-py312-cu130-ubuntu24.04-sagemaker",
"Environment": {
"SM_SGLANG_MODEL_PATH": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
},
ExecutionRoleArn="arn:aws:iam::<account_id>:role/<role_name>",
)
sagemaker.create_endpoint_config(
EndpointConfigName="sglang-endpoint-config",
ProductionVariants=[
{
"VariantName": "default",
"ModelName": "sglang-model",
"InstanceType": "ml.g5.2xlarge",
"InitialInstanceCount": 1,
"InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-3-1",
}
],
)
sagemaker.create_endpoint(
EndpointName="sglang-endpoint",
EndpointConfigName="sglang-endpoint-config",
)
Deploy a vLLM inference endpoint:¶
import boto3
sagemaker = boto3.client("sagemaker")
sagemaker.create_model(
ModelName="vllm-model",
PrimaryContainer={
"Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.21.0-gpu-py312-cu130-ubuntu22.04-sagemaker",
"Environment": {
"SM_VLLM_MODEL": "meta-llama/Llama-3.1-8B-Instruct",
"HF_TOKEN": "<your_hf_token>",
},
},
ExecutionRoleArn="arn:aws:iam::<account_id>:role/<role_name>",
)
sagemaker.create_endpoint_config(
EndpointConfigName="vllm-endpoint-config",
ProductionVariants=[
{
"VariantName": "default",
"ModelName": "vllm-model",
"InstanceType": "ml.g5.2xlarge",
"InitialInstanceCount": 1,
"InferenceAmiVersion": "al2-ami-sagemaker-inference-gpu-3-1",
}
],
)
sagemaker.create_endpoint(
EndpointName="vllm-endpoint",
EndpointConfigName="vllm-endpoint-config",
)
Running on Amazon EC2¶
Running PyTorch Training Container on an EC2 Instance¶
# Run interactively
docker run -it --gpus all <account_id>.dkr.ecr.<region>.amazonaws.com/<repository>:<tag> bash
# Example: Run PyTorch container
docker run -it --gpus all 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.10.0-cpu-py313-ubuntu22.04-ec2 bash
# Mount local directories to persist data
docker run -it --gpus all -v /local/data:/data 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.10.0-cpu-py313-ubuntu22.04-ec2 bash
Quick Links¶
- Available Images - Browse all container images
- Support Policy - Framework versions and timelines
- vLLM Guide - Detailed vLLM deployment (EC2, SageMaker, EKS)
- Ray Guide - Ray Serve deployment with examples
- vLLM-Omni Guide - Multimodal serving (TTS, image, video)