Tested 4 brand new frontier models (2 Chinese, 1 diffusion, 1 agent-focused) with a riddle that has no logical shortcut. One of them fabricated sources four times in a row.

Summary

I've been running the same weird test on every new model that ships: a riddle that can't be solved by pattern-matching or web search, only by actually connecting two unrelated things. This time I added a second riddle and ran both against four models that all shipped in the last few weeks: MiMo-V2.5-Pro (Xiaomi), MiniMax M3, Mercury 2 (Inception Labs, diffusion-based), and LongCat-2.0 (Meituan). Rules: no web search, no context given beforehand, up to 3 hints only if requested, same prom

SpinGraph analysis pending — check back after processing.

Ask AI about this story

See how AI engines summarize this narrative — one click, prompt included.

ChatGPT Claude Perplexity Gemini Grok

More from Reddit r/artificial

View all →

Markdown (.md) · JSON-LD schema (.json) · Machine-readable for AI & GEO