#GPT-5-mini

2 related articles

2026年6月23日·1 min

GPT-5 SWE-bench Evaluation: GPT-5-mini Crushes the Competition on Cost-Effectiveness vs Claude Sonnet 4

mini-SWE-agent's GPT-5 series evaluation on SWE-bench shows GPT-5 matches Claude Sonnet 4, while GPT-5-mini loses only ~5 points at less than 1/5 the cost.

2026年6月23日·2 min

mini-SWE-agent Roulette Mode: Why Randomly Switching Between LLMs Actually Performs Better

SWE-agent team finds mini-SWE-agent randomly switching between GPT-5 and Claude Sonnet 4 outscores either model alone on SWE-bench. Exploring the diversity hypothesis behind Roulette Mode.