Qwen 3.6 27B - VLLM Performance Benchmark Results (BF16, FP8, NVFP4)

Summary

Sharing some testing of Qwen 3.6 27B using VLLM across the popular quants on my development system. I used llama benchy to generate the results, then fed it into an LLM to format it the tables for readibility. While NVFP4 is blazing fast, have had looping issues in copilot that I don't get with BF16, and the responses in general when used in agent mode seem to be less thorough than the higher quants. Based on these results, FP8 seems to be the right choice. Some of the parameters can be furt

SpinGraph analysis pending — check back after processing.

Ask AI about this story

Opens with the SpinGraph .md URL and structured context — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/LocalLLaMA

View all →

Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO