Skip to main content

ROCm * hip graphs * gpus * ctx bench

More info
topo
========================= ROCm System Management Interface =========================
============================= Weight between two GPUs ==============================
GPU0 GPU1 GPU2 GPU3
GPU0 0 40 40 40
GPU1 40 0 40 40
GPU2 40 40 0 40
GPU3 40 40 40 0

============================== Hops between two GPUs ===============================
GPU0 GPU1 GPU2 GPU3
GPU0 0 2 2 2
GPU1 2 0 2 2
GPU2 2 2 0 2
GPU3 2 2 2 0

============================ Link Type between two GPUs ============================
GPU0 GPU1 GPU2 GPU3
GPU0 0 PCIE PCIE PCIE
GPU1 PCIE 0 PCIE PCIE
GPU2 PCIE PCIE 0 PCIE
GPU3 PCIE PCIE PCIE 0

==================================== Numa Nodes ====================================
GPU[0] : (Topology) Numa Node: 0
GPU[0] : (Topology) Numa Affinity: 0
GPU[1] : (Topology) Numa Node: 0
GPU[1] : (Topology) Numa Affinity: 0
GPU[2] : (Topology) Numa Node: 0
GPU[2] : (Topology) Numa Affinity: 0
GPU[3] : (Topology) Numa Node: 0
GPU[3] : (Topology) Numa Affinity: 0
=============================== End of ROCm SMI Log ================================

31:00.0
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
LnkSta: Speed 16GT/s, Width x8 (downgraded)
34:00.0
LnkCap: Port #2, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
LnkSta: Speed 16GT/s, Width x8 (downgraded)
4b:00.0
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
LnkSta: Speed 16GT/s, Width x8 (downgraded)
4e:00.0
LnkCap: Port #2, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
LnkSta: Speed 16GT/s, Width x8 (downgraded)
llama.cpp ver
flags:
- GGML_HIP=ON
- GGML_HIP_GRAPHS=OFF
- GGML_HIP_RCCL=ON
- AMDGPU_TARGETS=gfx906
- GGML_BACKEND_DL=ON
- GGML_CPU_ALL_VARIANTS=ON
- GGML_AVX512=ON
- GGML_AVX512_VBMI=ON
- GGML_AVX512_VNNI=ON
- GGML_AVX512_BF16=ON
- CMAKE_BUILD_TYPE=Release
- LLAMA_BUILD_TESTS=OFF

ggml_cuda_init: found 4 ROCm devices (Total VRAM: 131008 MiB):
Device 0: AMD Instinct MI60 / MI50, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
Device 1: AMD Instinct MI60 / MI50, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
Device 2: AMD Instinct MI60 / MI50, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
Device 3: AMD Instinct MI60 / MI50, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
load_backend: loaded ROCm backend from /app/libggml-hip.so
load_backend: loaded CPU backend from /app/libggml-cpu-icelake.so
build: 2555826 (1)

Bench

numactl --membind=0 --cpunodebind=0 \
./llama-bench \
--hf-repo unsloth/Qwen3.5-9B-GGUF:Q8_0 \
--device rocm0,rocm0/rocm1,rocm0/rocm1/rocm2/rocm3 \
--split-mode tensor --flash-attn 1 \
--n-prompt 2048 --ubatch-size 2048 --batch-size 2048 \
--n-gen 256 \
--n-depth 0,16384,32768
modelROCmHIP_GRAPHSGPUstestctxt/st/s ±
qwen35 9B Q8_07.2.3OFF1pp20480507.790.19
qwen35 9B Q8_07.2.3OFF1tg256058.830.18
qwen35 9B Q8_07.2.3OFF1pp204816384445.130.65
qwen35 9B Q8_07.2.3OFF1tg2561638456.010.25
qwen35 9B Q8_07.2.3OFF1pp204832768401.250.66
qwen35 9B Q8_07.2.3OFF1tg2563276853.470.22
qwen35 9B Q8_07.2.3OFF2pp20480918.780.19
qwen35 9B Q8_07.2.3OFF2tg256070.130.08
qwen35 9B Q8_07.2.3OFF2pp204816384791.192.97
qwen35 9B Q8_07.2.3OFF2tg2561638468.090.35
qwen35 9B Q8_07.2.3OFF2pp204832768720.332.47
qwen35 9B Q8_07.2.3OFF2tg2563276866.050.34
qwen35 9B Q8_07.2.3OFF4pp204801519.300.18
qwen35 9B Q8_07.2.3OFF4tg256073.160.35
qwen35 9B Q8_07.2.3OFF4pp2048163841348.318.42
qwen35 9B Q8_07.2.3OFF4tg2561638471.020.39
qwen35 9B Q8_07.2.3OFF4pp2048327681212.536.94
qwen35 9B Q8_07.2.3OFF4tg2563276870.420.40
qwen35 9B Q8_06.3.3OFF1pp20480517.960.21
qwen35 9B Q8_06.3.3OFF1tg256053.510.15
qwen35 9B Q8_06.3.3OFF1pp204816384454.600.98
qwen35 9B Q8_06.3.3OFF1tg2561638451.190.20
qwen35 9B Q8_06.3.3OFF1pp204832768409.041.07
qwen35 9B Q8_06.3.3OFF1tg2563276849.010.18
qwen35 9B Q8_06.3.3OFF2pp20480936.040.22
qwen35 9B Q8_06.3.3OFF2tg256066.710.23
qwen35 9B Q8_06.3.3OFF2pp204816384804.812.05
qwen35 9B Q8_06.3.3OFF2tg2561638464.660.33
qwen35 9B Q8_06.3.3OFF2pp204832768734.612.66
qwen35 9B Q8_06.3.3OFF2tg2563276862.520.16
qwen35 9B Q8_06.3.3OFF4pp204801542.160.30
qwen35 9B Q8_06.3.3OFF4tg256072.730.48
qwen35 9B Q8_06.3.3OFF4pp2048163841368.468.75
qwen35 9B Q8_06.3.3OFF4tg2561638471.300.90
qwen35 9B Q8_06.3.3OFF4pp2048327681232.137.15
qwen35 9B Q8_06.3.3OFF4tg2563276871.980.43
qwen35 9B Q8_07.2.3ON1pp20480507.630.34
qwen35 9B Q8_07.2.3ON1tg256059.310.23
qwen35 9B Q8_07.2.3ON1pp204816384445.030.88
qwen35 9B Q8_07.2.3ON1tg2561638456.280.23
qwen35 9B Q8_07.2.3ON1pp204832768401.900.14
qwen35 9B Q8_07.2.3ON1tg2563276853.810.24
qwen35 9B Q8_07.2.3ON2pp20480918.250.88
qwen35 9B Q8_07.2.3ON2tg256071.990.26
qwen35 9B Q8_07.2.3ON2pp204816384790.922.60
qwen35 9B Q8_07.2.3ON2tg2561638469.970.45
qwen35 9B Q8_07.2.3ON2pp204832768720.642.29
qwen35 9B Q8_07.2.3ON2tg2563276867.820.42
qwen35 9B Q8_07.2.3ON4pp204801521.343.65
qwen35 9B Q8_07.2.3ON4tg256076.250.48
qwen35 9B Q8_07.2.3ON4pp2048163841344.439.40
qwen35 9B Q8_07.2.3ON4tg2561638474.330.75
qwen35 9B Q8_07.2.3ON4pp2048327681212.617.98
qwen35 9B Q8_07.2.3ON4tg2563276873.310.78
qwen35 9B Q8_06.3.3ON1pp20480517.520.82
qwen35 9B Q8_06.3.3ON1tg256052.310.34
qwen35 9B Q8_06.3.3ON1pp204816384454.430.64
qwen35 9B Q8_06.3.3ON1tg2561638450.190.24
qwen35 9B Q8_06.3.3ON1pp204832768409.240.48
qwen35 9B Q8_06.3.3ON1tg2563276848.180.23
qwen35 9B Q8_06.3.3ON2pp20480936.020.58
qwen35 9B Q8_06.3.3ON2tg256062.820.37
qwen35 9B Q8_06.3.3ON2pp204816384805.963.75
qwen35 9B Q8_06.3.3ON2tg2561638462.000.58
qwen35 9B Q8_06.3.3ON2pp204832768734.653.16
qwen35 9B Q8_06.3.3ON2tg2563276860.410.61
qwen35 9B Q8_06.3.3ON4pp204801543.772.10
qwen35 9B Q8_06.3.3ON4tg256062.750.58
qwen35 9B Q8_06.3.3ON4pp2048163841360.9716.06
qwen35 9B Q8_06.3.3ON4tg2561638463.521.18
qwen35 9B Q8_06.3.3ON4pp2048327681231.257.67
qwen35 9B Q8_06.3.3ON4tg2563276862.491.02