注册并分享邀请链接,可获得视频播放与邀请奖励。

Arena.ai (@arena) “GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely” — TopicDigg

Arena.ai 的个人资料封面
Arena.ai 的头像
Arena.ai
@arena
Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring →
加入 March 2023
215 正在关注    167.6K 粉丝
GLM-5.2 (Max) by @Zai_org ranks #10# on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1# open model by a wide margin! In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology. Compared to 5.1, GLM-5.2 (Max) climbs from #13# to #10#. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%). GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window. Huge congrats @Zai_org for the incredible release! See thread for details on how GLM-5.2 (Max) performs across 5 different signals.
显示更多
0
17
527
50
转发到社区