Uh.. Honey, how do you feel about takeout?

Summary

- 2x RTX Pro 6000 Max-Q (96GB) - 8x RTX 3090 (24GB) - 2x RTX 5090 (32GB) - 3 PSUs - 128GB DDR5 SDIMM RAM (4-channel) - Threadripper 9960x - 1x Ryobi Portable Fan - 1x large Uber Eats bill 448GB VRAM Running MiniMax M3 in AWQ-INT4 on VLLM via PP over TP groups of 2. ~30 tp/s per single stream ~960 tp/s batch Can get 1m context for one user, but ideally want 4x concurrency. TBD where context will land… or my marriage… submitted by /u/MotorcyclesAndBizniz [link] [comments]

SpinGraph analysis pending — check back after processing.

Ask AI about this story

See how AI engines summarize this narrative — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/LocalLLaMA