2 related articles

mini-SWE-agent's GPT-5 series evaluation on SWE-bench shows GPT-5 matches Claude Sonnet 4, while GPT-5-mini loses only ~5 points at less than 1/5 the cost.

SWE-agent team finds mini-SWE-agent randomly switching between GPT-5 and Claude Sonnet 4 outscores either model alone on SWE-bench. Exploring the diversity hypothesis behind Roulette Mode.