Changelog¶

Changelog for the Amazon Linux 2023-based SGLang images (server-cuda, server-sagemaker-cuda).

v1.2.0 — 2026-07-06¶

Tags: server-cuda-v1.2 · server-sagemaker-cuda-v1.2

SGLang source: bc8b3ab (0.5.14+amzn2023.bc8b3ab)

Bundled versions: CUDA 13.0.3 · Python 3.12 · PyTorch 2.11.0 · sgl-kernel 0.4.4 · FlashInfer 0.6.12 · Mooncake 0.3.11.post1 · NCCL 2.30.4

Bumped SGLang to 0.5.14 (upstream commit bc8b3ab)
Added support for NVIDIA LocateAnything-3B — a multimodal vision-grounding model that returns bounding boxes (<box>…</box>) for objects matching a text description. Bundled the decord, lmdb, and peft runtime dependencies required by the model's custom Hugging Face processors
Upgraded EFA to 1.49.0
Upgraded stack: sgl-kernel 0.4.4, FlashInfer 0.6.12, Mooncake 0.3.11.post1, NCCL 2.30.4, gdrcopy 2.6, Rust 1.96.1

Allowlisted CVE-2026-27145 (Go stdlib x509 VerifyHostname CPU exhaustion) — embedded in Mooncake's libetcd_wrapper.so; cannot be patched without an upstream Mooncake rebuild with Go 1.26.4+

Tags: server-cuda-v1.1 · server-sagemaker-cuda-v1.1

SGLang source: 66ab5c9 (0.5.13+amzn2023.66ab5c9)

Bundled versions: CUDA 13.0.3 · Python 3.12 · PyTorch 2.11.0 · sgl-kernel 0.4.3 · FlashInfer 0.6.11.post1 · Mooncake 0.3.9 · NCCL 2.28.3

Bumped SGLang to 0.5.13 (upstream commit 66ab5c9)
Added NIXL KV connector (nixl + matching nixl-cu13) for prefill/decode disaggregation KV transfer
Added runai-model-streamer with the [s3,gcs,azure] extras for fast weight streaming from object storage (sglang[all] omits the [s3] extra)

Patched starlette CVE GHSA-82w8-qh3p-5jfq — pinned starlette>=1.3.1
Removed the build-only rust/ source tree from the runtime stage (only the compiled sglang-router binary ships); resolves a pyo3 CVE GHSA-36hh-v3qg-5jq4 flagged on the leftover Cargo.lock

Tags: server-cuda-v1.0 · server-sagemaker-cuda-v1.0

SGLang source: 578d27e (0.5.12+amzn2023.578d27e)

Bundled versions: CUDA 13.0.3 · Python 3.12 · PyTorch 2.11.0 · sgl-kernel 0.4.3 · FlashInfer 0.6.11.post1 · Mooncake 0.3.9 · NCCL 2.28.3

Initial release of SGLang Server containers on Amazon Linux 2023
Built from upstream SGLang source (not the pre-built lmsysorg/sglang image)
Simplified tag format: server-cuda[-vMAJOR[.MINOR[.PATCH]]]
OpenAI-compatible API server on port 30000 (EC2 / EKS) and 8080 (SageMaker)
Multi-GPU inference via tensor parallelism with NCCL
CUDA 13.0 build targeting H100 (sm_90) and Blackwell (sm_100, sm_103)
EFA support for multi-node deployments
DeepEP expert-parallel kernels and Mooncake KV-cache transfer bundled for large MoE models