#AI safety mechanisms

2 related articles

2026年6月17日·3 min

Testing DeepSeek's Safety Mechanisms: Multiple Jailbreak Attempts Successfully Blocked

An overseas security blogger systematically tested DeepSeek's jailbreak resistance using direct requests, rephrased prompts, and varied strategies. Results show robust intent recognition, consistent blocking, and context-aware safety mechanisms.

2026年6月13日·2 min

AI Automated Review Becomes the Default: How a Sub-Agent Classifier Achieves 97% Accuracy

AI agent auto-review is now default for all users. A classifier subagent achieves 97% accuracy with three-tier safety decisions. Deep dive into how it works and its impact on AI safety.