注册并分享邀请链接,可获得视频播放与邀请奖励。

Artificial Analysis (@ArtificialAnlys) “GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-wo” — TopicDigg

Artificial Analysis 的个人资料封面
Artificial Analysis 的头像
Artificial Analysis
@ArtificialAnlys
Independent analysis of AI
加入 January 2024
645 正在关注    105.3K 粉丝
GLM-5.2 leads open weights models and sits at #3# overall on GDPval-AA, a real-world agentic work benchmark GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks. Key takeaways: ➤ #3# overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509) ➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408 ➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158) ➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches ➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3# on the Agentic Index, and #3# on AA-Briefcase
显示更多
0
32
957
122
转发到社区