DeepSeek-V4-Flash in MXFP4 is too slow on CPU
View original on reddit.comSummary
I have an old Xeon rig with 512Gb of 4-channel DDR4 2133 memory and E5-2699v4 processor. For GPU I have GTX 1060 with 6Gb of VRAM, so I use CPU only mode. I can run GLM 5.2 with 40B active parameters in Q4_K_XL at 1.8 t/s, but as you can understand it is too slow. So I wanted to give a try to a new Bartowski quantization of DeepSeek-V4-Flash with 13B active parameters in MXFP4. Unfortunately, the maximum I can get is 3.2 t/s of tg, which is very disappointing. Judging by speeds of GLM 5.2 I was
SpinGraph analysis pending — check back after processing.
Ask AI about this story
Opens with the SpinGraph .md URL and structured context — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- Considering Buying Another RTX 3090 - Benefits?
- longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license
- GH Copilot’s BYOK Blocking for Inline Completion Makes No Sense. [THE FIX]
- Agents-A1-Q8_0-GGUF works pretty well for me (anecdotal feedback)
- Best choice of model 40B+ Parameters
- Any word on Qwen 3.7 9B? (Also looking for 9B-class alternatives to Qwen 3.5)
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO