注册并分享邀请链接,可获得视频播放与邀请奖励。

BURKOV (@burkov) “In this paper, a 7B language model trained with reinforcement learning learns to” — TopicDigg

BURKOV 的个人资料封面
BURKOV 的头像
BURKOV
@burkov
Books: & App: PhD in AI, author of 📖 The Hundred-Page LMs Book & The Hundred-Page ML Book
加入 June 2009
117 正在关注    57.4K 粉丝
In this paper, a 7B language model trained with reinforcement learning learns to orchestrate larger frontier models like GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. It does so by writing natural-language subtasks, assigning each to one of the workers, and specifying which previous outputs that worker sees in context. The resulting system outperforms every individual frontier model on benchmarks including GPQA Diamond, LiveCodeBench, and AIME25, while averaging about three model calls per question—fewer than the multi-agent pipelines and self-reflection loops it beats. The work provides evidence that prompt engineering and pipeline design, currently done by hand in commercial AI products, can be learned end-to-end through reward signals alone. Read with an AI tutor: PDF:
显示更多
0
32
458
72
转发到社区