Changelog¶
Changelog for the Amazon Linux 2023-based SGLang images (server-cuda, server-sagemaker-cuda).
v1.0.0 — 2026-06-12¶
Tags: server-cuda-v1.0 · server-sagemaker-cuda-v1.0
SGLang source: 578d27e (0.5.12+amzn2023.578d27e)
Bundled versions: CUDA 13.0.3 · Python 3.12 · PyTorch 2.11.0 · sgl-kernel 0.4.3 · FlashInfer 0.6.11.post1 · Mooncake 0.3.9 · NCCL 2.28.3
Highlights¶
- Initial release of SGLang Server containers on Amazon Linux 2023
- Built from upstream SGLang source (not the pre-built
lmsysorg/sglangimage) - Simplified tag format:
server-cuda[-vMAJOR[.MINOR[.PATCH]]] - OpenAI-compatible API server on port 30000 (EC2 / EKS) and 8080 (SageMaker)
- Multi-GPU inference via tensor parallelism with NCCL
- CUDA 13.0 build targeting H100 (sm_90) and Blackwell (sm_100, sm_103)
- EFA support for multi-node deployments
- DeepEP expert-parallel kernels and Mooncake KV-cache transfer bundled for large MoE models