【小模型再次震撼!3B参数VibeThinker-3B数学推理直逼前沿】
Weibo AI刚放出的VibeThinker-3B(基于Qwen2.5-Coder),在AIME26拿到94.3(+CLR后97.1),LiveCodeBench v6 80.2,LeetCode近赛96.1%通过率……这成绩直接干翻一堆远大于它的模型。
核心是Spectrum-to-Signal后训练策略:多样性蒸馏 + RL优化,专注verifiable reasoning,没走通用知识堆参数的老路。
这类高效小模型太香了——本地跑得动、成本低、推理强,特别适合数学/算法密集的子任务。
HuggingFace:
小模型时代真的来了,参数不是万能的,post-training才是王道。你们觉得这个3B能直接上生产吗?
显示更多
Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5-Coder. The paper doesn't provide many details, but it appears they distill from RL ckpts and then do a final RL-based instruct RL.
🔗
显示更多