gguf格式下,各种量化方法后的支持情况,及运行速度
Library | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute |
---|---|---|---|---|---|---|---|---|---|
K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 慢 | ✅慢 | ✅ |
I-quants | ✅慢 | ✅慢 | ✅慢 | ✅ | ✅ | Partial¹ | ✅ | ✅ | ✅ |
Multi-GPU | N/A | N/A | N/A | ✅ | ❓ | ✅ | ❓ | ✅ | ❓ |
K cache quants | ✅ | ❓ | ✅ | ✅ 慢 | Partial⁶慢 | ❓ | ✅ | ✅ | ✅ |
MoE architecture | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | ✅ | ✅ |
Note: