Skip to content

Embedding Serving using TEI DLC

Production-ready Docker images for serving embedding, reranker, and sequence-classification models with Text Embeddings Inference (TEI) on Amazon SageMaker AI. Built and published by Hugging Face in collaboration with AWS.

TEI is a high-performance toolkit written in Rust for serving text embedding models, with dynamic batching, optimized transformer kernels (using Flash Attention, Candle, and cuBLASLt), and small, fast-booting images.

Images

Accelerator Image (us-east-1 example) Default Port
GPU 683313688378.dkr.ecr.us-east-1.amazonaws.com/tei:2.0.1-tei1.8.2-gpu-py310-cu122-ubuntu22.04 8080
CPU 683313688378.dkr.ecr.us-east-1.amazonaws.com/tei-cpu:2.0.1-tei1.8.2-cpu-py310-ubuntu22.04 8080

Unlike most AWS Deep Learning Containers, the TEI images are hosted in a different ECR account in each region. The simplest way to get the right URI is the SageMaker Python SDK helper, which resolves the account for your session's region:

from sagemaker.huggingface import get_huggingface_llm_image_uri

gpu_image = get_huggingface_llm_image_uri("huggingface-tei", version="1.8.2")
cpu_image = get_huggingface_llm_image_uri("huggingface-tei-cpu", version="1.8.2")

The per-region registry account IDs are listed below.

Region Account ID
af-south-1 510948584623
ap-east-1 651117190479
ap-northeast-1 354813040037
ap-northeast-2 366743142698
ap-northeast-3 867004704886
ap-south-1 720646828776
ap-south-2 628508329040
ap-southeast-1 121021644041
ap-southeast-2 783357654285
ap-southeast-3 951798379941
ap-southeast-4 106583098589
ca-central-1 341280168497
ca-west-1 190319476487
cn-north-1 450853457545
cn-northwest-1 451049120500
eu-central-1 492215442770
eu-central-2 680994064768
eu-north-1 662702820516
eu-south-1 978288397137
eu-south-2 104374241257
eu-west-1 141502667606
eu-west-2 764974769150
eu-west-3 659782779980
il-central-1 898809789911
me-central-1 272398656194
me-south-1 801668240914
sa-east-1 737474898029
us-east-1 683313688378
us-east-2 257758044811
us-gov-east-1 237065988967
us-gov-west-1 414596584902
us-iso-east-1 833128469047
us-isob-east-1 281123927165
us-west-1 746614075791
us-west-2 246618743249

API Endpoints

On Amazon SageMaker AI, all traffic goes through POST /invocations (with GET /ping for health checks). At startup, the container binds /invocations to the task matching the loaded model type:

Model Type Task Payload
Embedding Embeddings {"inputs": "..."} or {"inputs": ["...", "..."]}
Reranker Reranking {"query": "...", "texts": ["...", "..."]}
Classifier Sequence classification {"inputs": "..."}

Refer to the TEI API documentation for request/response schemas.

How They're Built

  • Released with TEI — image versions track upstream TEI releases and are published by Hugging Face to the regional SageMaker registries.
  • Discoverable via the SageMaker SDK — current versions are registered in the SageMaker Python SDK image URI config, so get_huggingface_llm_image_uri always resolves a valid URI.

For deployment walkthroughs, see Amazon SageMaker AI Deployment.