Uh.. Honey, how do you feel about takeout?
View original on reddit.comSummary
- 2x RTX Pro 6000 Max-Q (96GB) - 8x RTX 3090 (24GB) - 2x RTX 5090 (32GB) - 3 PSUs - 128GB DDR5 SDIMM RAM (4-channel) - Threadripper 9960x - 1x Ryobi Portable Fan - 1x large Uber Eats bill 448GB VRAM Running MiniMax M3 in AWQ-INT4 on VLLM via PP over TP groups of 2. ~30 tp/s per single stream ~960 tp/s batch Can get 1m context for one user, but ideally want 4x concurrency. TBD where context will land… or my marriage… submitted by /u/MotorcyclesAndBizniz [link] [comments]
SpinGraph analysis pending — check back after processing.
Ask AI about this story
See how AI engines summarize this narrative — one click, prompt included.
More from Reddit r/LocalLLaMA
View all →- Qwen3.6 27B on a 5090, 6.4k sample tok/s distribution after tuning MTP/cache settings
- DGX Spark and Overtemps
- Gemma 4 12B - MLX Kernel
- Using local models with Hermes vs Claude code
- I merged fixes for quantized KV cache into my DeepSeek V4 branch
- Ran a classic(medival europe) fantasy RP/agentic benchmark across 8 local models Qwen3.6-27B held up better than its size suggests
Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO