Skip to content

Supported Models

All models listed below are regression-tested on every DLC SGLang release and work with the images listed on the Overview page.

The Coverage column indicates test depth: Smoke runs on every PR; Benchmark runs throughput and latency tests with pass/fail thresholds before release. A Smoke + Benchmark tag means both apply.

Tested Models

Family Model Coverage
Llama meta-llama/Llama-3.3-70B-Instruct Benchmark
Qwen Qwen/Qwen3-32B Benchmark
Qwen/Qwen3.5-0.8B Smoke + Benchmark
Qwen/Qwen3.5-9B Benchmark
Qwen/Qwen3.5-27B-FP8 Benchmark
Qwen/Qwen3.5-35B-A3B-FP8 Benchmark
Qwen/Qwen3-Coder-Next-FP8 Benchmark
GPT-OSS openai/gpt-oss-20b Benchmark
DeepSeek deepseek-ai/DeepSeek-V4-Flash Benchmark

Custom Models

Any model supported by upstream SGLang should work. To serve a model not listed above:

docker run --gpus all -p 30000:30000 \
  public.ecr.aws/deep-learning-containers/sglang:server-cuda \
  --model-path <org>/<model-name>

Models can also be loaded from a local path (-v /path:/model --model-path /model). See the SGLang supported models list for the full upstream coverage.