I merged fixes for quantized KV cache into my DeepSeek V4 branch

Summary

Check it out: https://github.com/fairydreaming/llama.cpp/tree/dsv4 They are PRs #25247 , #25303 (mine) and #25202 (from am17an) but I omitted some padding changes from the last one that I think are not necessary. So if it crashes for you let me know. Also some perplexity values: f16: $ ./bin/llama-perplexity -m ~/ggufs/DeepSeek-V4-Flash.gguf -f ../../perplexity/wikitext-2-raw/wiki.test.raw -c 8192 -b 8192 -ub 8192 -cmoe -fit off -fa 1 0.00.474.417 W llama_model_loader: tensor overrides to CPU ar

SpinGraph analysis pending — check back after processing.

Ask AI about this story

See how AI engines summarize this narrative — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/LocalLLaMA

View all →

Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO