SPIN Unprocessed
Source Reddit r/LocalLLaMA reddit.com Forum
July 5, 2026 ai_technology community

DeepSeek-V4-Flash in MXFP4 is too slow on CPU

View original on reddit.com

Summary

I have an old Xeon rig with 512Gb of 4-channel DDR4 2133 memory and E5-2699v4 processor. For GPU I have GTX 1060 with 6Gb of VRAM, so I use CPU only mode. I can run GLM 5.2 with 40B active parameters in Q4_K_XL at 1.8 t/s, but as you can understand it is too slow. So I wanted to give a try to a new Bartowski quantization of DeepSeek-V4-Flash with 13B active parameters in MXFP4. Unfortunately, the maximum I can get is 3.2 t/s of tg, which is very disappointing. Judging by speeds of GLM 5.2 I was

SpinGraph analysis pending — check back after processing.

Ask AI about this story

Opens with the SpinGraph .md URL and structured context — one click, prompt included.

More from Reddit r/LocalLLaMA

View all →

Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO