RTX5090, gemma-4-31B-it-Q6_K.gguf. Context: before - 35k, after - 80k!
View original on reddit.comSummary
Yesterday there was a message that you can increase the context for Deepseek Flash. But it turned out that everything works for Gemma4 too! function dockergemma () { docker run \ -e GGML_CUDA_NO_PINNED=1 \ -p "$PORT_GEMMA":"$PORT_GEMMA" \ -v "$LLM_PATH" \ -v "$WORKSPACE_PATH" \ --gpus "$LLM_GPU1" "$LLM_DOCKER_IMAGE" \ --host 0.0.0.0 --threads 23 --flash-attn on --fit off --main-gpu 1 --jinja \ --port "$PORT_GEMMA" \ --ctx-size
SpinGraph analysis pending — check back after processing.
Ask AI about this story
Opens with the SpinGraph .md URL and structured context — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- DeepSeek-V4-Flash in MXFP4 is too slow on CPU
- GH Copilot’s BYOK Blocking for Inline Completion Makes No Sense. [THE FIX]
- Agents-A1-Q8_0-GGUF works pretty well for me (anecdotal feedback)
- Best choice of model 40B+ Parameters
- Any word on Qwen 3.7 9B? (Also looking for 9B-class alternatives to Qwen 3.5)
- [Paper] Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO