Arena.ai (@arena) — TopicDigg

2026.06.16 18:55

Exciting news: GLM-5.2 (Max) ranks #2# in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2# React and #4# HTML sub-leaderboards - Ranks as the top model in nearly all sub categories: Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Product, Gaming, and Simulations. Congrats @Zai_org for the incredible milestone!

显示更多

160

4.3K

492

转发到社区

Arena.ai@arena

2026.06.16 17:58

GLM-5.2 (Max) by @Zai_org ranks #10# on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1# open model by a wide margin! In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology. Compared to 5.1, GLM-5.2 (Max) climbs from #13# to #10#. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%). GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window. Huge congrats @Zai_org for the incredible release! See thread for details on how GLM-5.2 (Max) performs across 5 different signals.

显示更多

527

转发到社区

Arena.ai@arena

2026.05.26 15:36

Qwen3.7 Max (20250517) debuts at #4# in Code Arena: Frontend - the top-ranked Chinese lab on the board, surpassing GLM-5.1 and is now on par with Claude Opus 4.6 on agentic web development tasks. Huge congrats to @Alibaba_Qwen on this achievement!

显示更多

942

转发到社区

Arena.ai@arena

2026.05.18 15:42

Qwen3.7 Preview By @Alibaba_Qwen lands on Arena for Text and Vision. In Text Arena, Qwen3.7 Max Preview ranks #13# overall. Alibaba is now the #6# lab in this arena. - #7# Math - #9# Expert - #9# Software & IT - #10# Coding In Vision Arena: Qwen3.7 Plus Preview ranks #16# overall, making Alibaba the #5# lab. Congrats to the @Alibaba_Qwen team on the latest progress!

显示更多

403

转发到社区

Arena.ai@arena

2026.05.14 16:59

US vs China update. Stanford's AI Index put the US–China gap at 2.7%. Here's what two years of real-world use from the Text Arena shows. Gap three years ago: +278. Today: +29. @AnthropicAI's Claude Opus 4.6 Thinking vs. Baidu's @ErnieforDevs Ernie 5.1 at the top. The US has never lost #1#, but the race keeps closing.

显示更多