注册并分享邀请链接,可获得视频播放与邀请奖励。

Cline (@cline) “We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks -” — TopicDigg

Cline 的个人资料封面
Cline 的头像
Cline
@cline
The open source coding agent that takes over your editor, terminal, and browser to complete work autonomously. npm i -g cline
加入 January 2025
7 正在关注    64.3K 粉丝
We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality: - GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81) - Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls - GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build. Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!
显示更多
0
136
4.9K
397
转发到社区