注册并分享邀请链接,可获得视频播放与邀请奖励。

Runpeng Dai (@RunpengDai) “What if we model test time adaptive sampling as MDP? In our recent work, RL-Guid” — TopicDigg

Runpeng Dai 的个人资料封面
Runpeng Dai 的头像
Runpeng Dai
@RunpengDai
Third-year Ph.D. candidate at @UNC. Intern at @Apple, previously @Tencent AI Labs. My research sits at the intersection of Reinforcement Learning and LLM.
加入 September 2025
45 正在关注    24 粉丝
What if we model test time adaptive sampling as MDP? In our recent work, RL-Guided Adaptive sampling, we model the test time sampling as a MDP. Then we train a 4-layer MLP on CPU as controller. This lightweight framework dynamically balances answer correctness, latency, and computation cost only rely on light statistics! 🚀 @zhengtoong @ruiliu0 @ChengsongH31219 @hongtuzhu1 @HongtuZ20093 📄 Paper: 💻 Code:
显示更多