注册并分享邀请链接,可获得视频播放与邀请奖励。

与「K2」相关的搜索结果

K2 贴吧
一个关键词就是一个贴吧,路径全站唯一。
创建贴吧
用户
未找到
包含 K2 的内容
FREEBUFF 让你可以使用 DeepSeek V4 Pro、Kimi K2.6 和 MiniMax M2.7 运行免费的编程智能体。
Announcing the Artificial Analysis Coding Agent Index! Our new coding agent benchmarks measure how combinations of agent harnesses and models perform on 3 leading benchmarks, token usage, cost and more When developers use AI to code they’re choosing a model, but also pairing it with a specific harness. It makes sense to benchmark that combination to understand and compare performance. The Artificial Analysis Coding Agent Index includes 3 leading benchmarks that represent a broad spectrum of coding agent use: ➤ SWE-Bench-Pro-Hard-AA, 150 realistic coding tasks that frontier models struggle with, sampled from Scale AI’s SWE-Bench Pro ➤ Terminal-Bench v2, 84 agentic terminal tasks from the Laude Institute and that range from system administration and cryptography to machine learning. 5 tasks were filtered due to environment incompatibility ➤ SWE-Atlas-QnA, 124 technical questions developed by Scale AI about how code behaves, root causes of issues, and more, requiring agents to explore codebases and give text answers Analysis of results: ➤ Opus 4.7 and GPT-5.5 lead the Index: Opus 4.7 in Cursor CLI scores 61, followed closely by GPT-5.5 in Codex and Opus 4.7 in Claude Code at 60. GPT-5.5 in Cursor CLI follows at 58. ➤ Open weights models are competitive, but still trail the leaders: GLM-5.1 in Claude Code is the top open-weight result at 53, followed by Kimi K2.6 and DeepSeek V4 Pro in Claude Code at 50. These are strong results, but still meaningfully behind the top proprietary models. ➤ Gemini 3.1 Pro in Gemini CLI underperforms: Gemini 3.1 Pro in Gemini CLI scores 43, well below where Gemini 3.1 Pro sits on our Intelligence Index, highlighting that Gemini’s performance in Gemini CLI remains a relative weak spot for Google’s offering. ➤ Cost per task (API token pricing) varies >30x: Composer 2 in Cursor CLI is cheapest at $0.07/task, followed by DeepSeek V4 Pro in Claude Code at $0.35/task and Kimi K2.6 in Claude Code at $0.76/task. At the high end, GPT-5.5 in Codex costs $2.21/task, while GLM-5.1 in Claude Code costs $2.26/task. For both models this was contributed to by high token usage, and in GPT-5.5’s case by a relatively higher per token cost. ➤ Token usage varies >3x: GLM-5.1 in Claude Code uses the most tokens at 4.8M/task, followed by Kimi K2.6 at 3.7M/task and DeepSeek V4 Pro at 3.5M/task. GPT-5.5 in Codex uses 2.8M tokens/task, substantially more than Opus 4.7 in Claude Code at 1.7M/task. In GLM-5.1’s case, higher token usage, cost and execution time were partly driven by the model entering loops on some tasks. ➤ Cache hit rates remain high but vary materially: Cache hit rates range from 80% to 96% across combinations. Provider routing, harness prompt structure and cache behavior can materially change the economics of running the same model given cached inputs are typically <50% the API price of regular input tokens. ➤ Time per task varies >7x: Opus 4.7 in Claude Code is fastest at ~6 minutes/task, while Kimi K2.6 in Claude Code is slowest at ~40 minutes/task. This is contributed to by differences in average turns per task, token usage and API serving speed. Opus 4.7 had materially lower amount of turns to complete a task than all other models while Kimi K2.6 had the most. ➤ Cursor made real progress with Composer 2: Composer 2 in Cursor CLI scores 48, near the leading open-weight model results, while being the cheapest combination measured at $0.07/task. Cursor has stated Composer 2 is built from Kimi K2.5, showcasing they have made substantial post-training gains. This is just the start. We are planning to add additional agents (both harnesses and models). Let us know what you would like to see added next.
显示更多
0
109
1.4K
161
转发到社区
🚨 YA EN VIVOOO 🚨 QSMP 2 / Día 32: Jueves de Lechits Superheroico (tengo 29 años) Los espero 🦸‍♂️
0
130
5.7K
381
转发到社区
《赛博朋克2077》 在游戏里在街头逛一逛,开着车绕一绕。哪怕不执行任务,也很开心。
大部分人养的龙虾。。。能力还不如 kimi agent k2.6。。。强烈建议试试kimi的agent模式,和agent swarm。。。 非常暴力
1次运行,直接生成 20,000 行数据集或 10 万字文献综述。 Moonshot AI 的 Kimi K2.6 Agent Swarm 把 AI 的商业价值从“聊天对话”强行拉到了“结构化资产产出”。 你觉得这种规模的自动化,会先让哪类白领失业?
显示更多
Kimi K2.6 在 1 小时前发布了,看起来非常棒!这里它正通过 MLX (mlx-vlm) 在两台 M3 Ultra 上运行(完整的 1T 参数量 VLM)。