#AI safety evaluation

7 related articles

2026年6月18日·3 min

OpenAI's Head of Evaluations: Never Underestimate a Model's Capabilities

OpenAI's Frontier Evaluations lead Tejal Patwardhan shares insights on O1's jailbreak breakthrough, wet lab experiments beating human baselines, and building the AGI Index—revealing AI capabilities evolving faster than imagined.

2026年6月17日·3 min

Testing DeepSeek's Safety Mechanisms: Multiple Jailbreak Attempts Successfully Blocked

An overseas security blogger systematically tested DeepSeek's jailbreak resistance using direct requests, rephrased prompts, and varied strategies. Results show robust intent recognition, consistent blocking, and context-aware safety mechanisms.

2026年6月6日·3 min

From Claude Oceanus to GPT-5.6: A Complete Breakdown of This Week's Major AI Model Updates

Deep analysis of this week's major AI model updates: Anthropic Oceanus red team leak, OpenAI GPT-5.6 Dual Alpha exposed, NVIDIA Nemotron Ultra 550B release, and AI recursive self-improvement research breakthrough.

2026年6月4日·4 min

OpenAI Red Team Testing Revealed: How Models Get 'Broken' Before Release

OpenAI reveals a critical pre-release step: dedicated red teams break and stress-test AI models. Learn how red teaming works, industry safety trends, and practical implications for developers.

2026年6月4日·4 min

OpenAI Red Teaming Revealed: How Models Get 'Broken' Before Release

OpenAI reveals a critical pre-release step: dedicated red teams break and stress-test AI models. Learn how red teaming works, industry safety trends, and practical implications for developers.

Tech Frontiers

AI Weekly: Claude Code Review, Gemma 4…

2026年6月1日·3 min

AI Weekly: Claude Code Review, Gemma 4 Leak & DeepSeek V4 Delayed

Weekly AI roundup: Anthropic launches Claude Code review, Google Gemma 4 leaks with MoE architecture, DeepSeek V4 delayed again, Microsoft Copilot Cowork reshapes collaboration, and OpenAI acquires PromptFool.

Tech Frontiers

Claude Opus 4.8 Deep Dive: Honesty Mat…

2026年5月28日·2 min

Claude Opus 4.8 Deep Dive: Honesty Matters More Than Benchmarks

Claude Opus 4.8 core upgrade: code bug oversight rate reduced 4x, model becomes more honest. Covers Dynamic Workflows parallel orchestration, Claude Code quota reset, effort control, and upcoming Miscells model.