0xFrancis (@xiao_zcloak) “DeepSWE 模型能力排行榜，是一个专门对模型执行长线复杂工程任务的排行榜，目测上去跟”

2026.05.27 07:43

DeepSWE 模型能力排行榜，是一个专门对模型执行长线复杂工程任务的排行榜，目测上去跟我自己的体感非常相近了。有趣的观察： 1. GPT-5.5 是目前工程领域绝对的王者。 2. 相对于它的价格，Claude真的很鸡肋。 3. Gemini 3.5 Flash 算是能用了，胜在速度快。 4. 国内的开源模型，最能打的还是Kimi。 5. Deepseek v4 pro 跟 Gemini 3 Flash基本是一个水平，很失望。

显示更多

Serena Ge (Datacurve)@serenaa_ge

2026.05.26 16:18

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

显示更多