#大模型评测

3 related articles

2026年6月2日·2 min

Testing 15 LLMs to Build a Bilibili Homepage: GPT Takes the Crown, Domestic Models Fall Behind

15 mainstream LLMs tested building a Bilibili video app from the same prompt. ChatGPT 5.4 tops overall, Claude excels at frontend, domestic models lag behind.

Benchmarking 15 LLMs Building a Bilibili Homepage: GPT Takes the Crown, Domestic Models Fall Behind

Product Reviews

2026年6月2日·2 min

Benchmarking 15 LLMs Building a Bilibili Homepage: GPT Takes the Crown, Domestic Models Fall Behind

15 mainstream LLMs tested building a Bilibili video app from the same prompt. ChatGPT 5.4 tops overall, Claude excels at frontend, domestic models lag behind.

Research

AI Gaming Showdown: O3 Pro Demonstrate…

2026年5月29日·2 min

AI Gaming Showdown: O3 Pro Demonstrates Stunning Planning Capabilities

Researchers tested major AI models with Tetris, Super Mario, and Sokoban. O3 Pro showed unprecedented planning ability, becoming the only model to clear all levels. Game testing reveals AI's evolution from pattern matching to strategic thinking.