·2 min
ViBench Benchmark: End-to-End App Creation Evaluation Reveals the True Level of AI Programming
ViBench is the first end-to-end app creation benchmark based on real-world tasks. Results show Claude Opus 4.8 leads in performance and cost-effectiveness, revealing gaps between SWE-bench scores and actual development capability.
Read more →