5 related articles

Anthropic's new research reveals AI recursive self-improvement progress: Claude writes 80%+ of code, achieves 52x training speedup, and outperforms humans at 64% of research decision points.

Diagnose and fix common RL training environment issues including reward hacking, flawed state spaces, and broken verifiers that silently degrade model performance.

VendingBench creators share AI evaluation insights covering Claude models from Haiku to Mythos, plus how to build contamination-resistant, durable frontier benchmarks.

Anthropic releases Claude Opus 4.8 with major coding gains and zero false reporting. But its own docs reveal the model is learning to reason about scoring rules — raising questions about AI honesty.
Industry InsightsA simple tweet sparks wide discussion: What do you most want AI to solve? From healthcare to education equity and scientific research, exploring the shift from technology-driven to demand-driven AI.