Embedding Serving using TEI DLC¶

Production-ready Docker images for serving embedding, reranker, and sequence-classification models with Text Embeddings Inference (TEI) on Amazon SageMaker AI. Built and published by Hugging Face in collaboration with AWS.

TEI is a high-performance toolkit written in Rust for serving text embedding models, with dynamic batching, optimized transformer kernels (using Flash Attention, Candle, and cuBLASLt), and small, fast-booting images.

Images¶

Accelerator	Image (`us-east-1` example)	Default Port
GPU	`683313688378.dkr.ecr.us-east-1.amazonaws.com/tei:2.0.1-tei1.8.2-gpu-py310-cu122-ubuntu22.04`	8080
CPU	`683313688378.dkr.ecr.us-east-1.amazonaws.com/tei-cpu:2.0.1-tei1.8.2-cpu-py310-ubuntu22.04`	8080

Unlike most AWS Deep Learning Containers, the TEI images are hosted in a different ECR account in each region. The simplest way to get the right URI is the SageMaker Python SDK helper, which resolves the account for your session's region:

from sagemaker.huggingface import get_huggingface_llm_image_uri

gpu_image = get_huggingface_llm_image_uri("huggingface-tei", version="1.8.2")
cpu_image = get_huggingface_llm_image_uri("huggingface-tei-cpu", version="1.8.2")

The per-region registry account IDs are listed below.

Region	Account ID
`af-south-1`	510948584623
`ap-east-1`	651117190479
`ap-northeast-1`	354813040037
`ap-northeast-2`	366743142698
`ap-northeast-3`	867004704886
`ap-south-1`	720646828776
`ap-south-2`	628508329040
`ap-southeast-1`	121021644041
`ap-southeast-2`	783357654285
`ap-southeast-3`	951798379941
`ap-southeast-4`	106583098589
`ca-central-1`	341280168497
`ca-west-1`	190319476487
`cn-north-1`	450853457545
`cn-northwest-1`	451049120500
`eu-central-1`	492215442770
`eu-central-2`	680994064768
`eu-north-1`	662702820516
`eu-south-1`	978288397137
`eu-south-2`	104374241257
`eu-west-1`	141502667606
`eu-west-2`	764974769150
`eu-west-3`	659782779980
`il-central-1`	898809789911
`me-central-1`	272398656194
`me-south-1`	801668240914
`sa-east-1`	737474898029
`us-east-1`	683313688378
`us-east-2`	257758044811
`us-gov-east-1`	237065988967
`us-gov-west-1`	414596584902
`us-iso-east-1`	833128469047
`us-isob-east-1`	281123927165
`us-west-1`	746614075791
`us-west-2`	246618743249

API Endpoints¶

On Amazon SageMaker AI, all traffic goes through POST /invocations (with GET /ping for health checks). At startup, the container binds /invocations to the task matching the loaded model type:

Model Type	Task	Payload
Embedding	Embeddings	`{"inputs": "..."}` or `{"inputs": ["...", "..."]}`
Reranker	Reranking	`{"query": "...", "texts": ["...", "..."]}`
Classifier	Sequence classification	`{"inputs": "..."}`

Refer to the TEI API documentation for request/response schemas.

How They're Built¶

Released with TEI — image versions track upstream TEI releases and are published by Hugging Face to the regional SageMaker registries.
Discoverable via the SageMaker SDK — current versions are registered in the SageMaker Python SDK image URI config, so get_huggingface_llm_image_uri always resolves a valid URI.

For deployment walkthroughs, see Amazon SageMaker AI Deployment.