GLM-5 has 744B parameters and scores worse on MMLU-Pro than a 9B model

Summary

Tier lists make S-tier and D-tier feel like different categories of thing entirely, red box at the top, blue box at the bottom. Actually plotted named models by parameter count against MMLU-Pro score instead of trusting the tier labels, and the picture is a lot messier than "bigger tier = bigger gap." Qwen3.5-9B, a 9B model, scores 82.5% on MMLU-Pro. GLM-5, at 744B parameters — 82x the size — scores 70.4%. That's not a diminishing-returns curve, that's negative returns; the 9B

SpinGraph analysis pending — check back after processing.

Ask AI about this story

See how AI engines summarize this narrative — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/artificial