Qwen3.6 35B A3B vLLM recipes
helm recipes
The recipes are written for Helm (mixa3607/charts/vllm), but you can easily rewrite them for Docker.
A. Qwen/Qwen3.6-35B-A3B
yaml setup
image:
registry: docker.io
repository: mixa3607/vllm-gfx906
tag: 0.19.1-rocm-7.2.1-aiinfos
extraEnvVars:
- name: FLASH_ATTENTION_TRITON_AMD_ENABLE
value: "TRUE"
- name: VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS
value: "1"
- name: RECIPE
value: "Setup A"
app-configuration:
vllm.yaml: |-
model: Qwen/Qwen3.6-35B-A3B
served-model-name:
- Qwen3.6-35B-A3B
- Qwen/Qwen3.6-35B-A3B
async-scheduling: true
trust-remote-code: true
enable-auto-tool-choice: true
reasoning-parser: qwen3
tool-call-parser: qwen3_coder
max-model-len: 200K
tensor-parallel-size: 4
data-parallel-size: 1
dtype: float16
gpu-memory-utilization: 0.95
limit-mm-per-prompt.image: 20
limit-mm-per-prompt.video: 4
max-num-seqs: 16
enable-expert-parallel: true
enable-prefix-caching: true
B. cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit
yaml setup
image:
registry: docker.io
repository: mixa3607/vllm-gfx906
tag: 0.19.1-rocm-7.2.1-aiinfos
extraEnvVars:
- name: FLASH_ATTENTION_TRITON_AMD_ENABLE
value: "TRUE"
- name: VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS
value: "1"
- name: RECIPE
value: "Setup A"
app-configuration:
vllm.yaml: |-
model: cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit
served-model-name:
- Qwen3.6-35B-A3B
- cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit
async-scheduling: true
trust-remote-code: true
enable-auto-tool-choice: true
reasoning-parser: qwen3
tool-call-parser: qwen3_coder
max-model-len: 200K
tensor-parallel-size: 4
data-parallel-size: 1
dtype: float16
gpu-memory-utilization: 0.95
limit-mm-per-prompt.image: 20
limit-mm-per-prompt.video: 4
max-num-seqs: 16
enable-expert-parallel: true
enable-prefix-caching: true