Agents-A1-Q8_0-GGUF works pretty well for me (anecdotal feedback)
View original on reddit.comSummary
For the last day or so I've been using Agents A1 Q8 InternScience/Agents-A1-Q8_0-GGUF on my M1 Max mac (64GB) just like this: llama-server -hf InternScience/Agents-A1-Q8_0-GGUF --host 0.0.0.0 --port 8080 --temp 0.85 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 1.1 --repeat-penalty 1.0 (these are the parameters they recommend) With full 262K context available I am getting about 500 t/s pp and about 40 t/s tg. I've been using opencode with it and it seems to be roughly Qwen level
SpinGraph analysis pending — check back after processing.
Ask AI about this story
Opens with the SpinGraph .md URL and structured context — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- Considering Buying Another RTX 3090 - Benefits?
- longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license
- DeepSeek-V4-Flash in MXFP4 is too slow on CPU
- GH Copilot’s BYOK Blocking for Inline Completion Makes No Sense. [THE FIX]
- Best choice of model 40B+ Parameters
- Any word on Qwen 3.7 9B? (Also looking for 9B-class alternatives to Qwen 3.5)
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO