RTX5090, gemma-4-31B-it-Q6_K.gguf. Context: before - 35k, after - 80k!

Summary

Yesterday there was a message that you can increase the context for Deepseek Flash. But it turned out that everything works for Gemma4 too! function dockergemma () { docker run \ -e GGML_CUDA_NO_PINNED=1 \ -p "$PORT_GEMMA":"$PORT_GEMMA" \ -v "$LLM_PATH" \ -v "$WORKSPACE_PATH" \ --gpus "$LLM_GPU1" "$LLM_DOCKER_IMAGE" \ --host 0.0.0.0 --threads 23 --flash-attn on --fit off --main-gpu 1 --jinja \ --port "$PORT_GEMMA" \ --ctx-size

SpinGraph analysis pending — check back after processing.

Ask AI about this story

Opens with the SpinGraph .md URL and structured context — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/LocalLLaMA