Qwen 3.6 27B - VLLM Performance Benchmark Results (BF16, FP8, NVFP4)
View original on reddit.comSummary
Sharing some testing of Qwen 3.6 27B using VLLM across the popular quants on my development system. I used llama benchy to generate the results, then fed it into an LLM to format it the tables for readibility. While NVFP4 is blazing fast, have had looping issues in copilot that I don't get with BF16, and the responses in general when used in agent mode seem to be less thorough than the higher quants. Based on these results, FP8 seems to be the right choice. Some of the parameters can be furt
SpinGraph analysis pending — check back after processing.
Ask AI about this story
Opens with the SpinGraph .md URL and structured context — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- is LM Link just too uncooked/experimental?
- Using llama.cpp with pi
- Considering Buying Another RTX 3090 - Benefits?
- longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license
- DeepSeek-V4-Flash in MXFP4 is too slow on CPU
- GH Copilot’s BYOK Blocking for Inline Completion Makes No Sense. [THE FIX]
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO