注册并分享邀请链接,可获得视频播放与邀请奖励。

Runpeng Dai 的个人资料封面
Runpeng Dai 的头像

Runpeng Dai (@RunpengDai)

@RunpengDai
Third-year Ph.D. candidate at @UNC. Intern at @Apple, previously @Tencent AI Labs. My research sits at the intersection of Reinforcement Learning and LLM.
45 正在关注    24 粉丝
What if we model test time adaptive sampling as MDP? In our recent work, RL-Guided Adaptive sampling, we model the test time sampling as a MDP. Then we train a 4-layer MLP on CPU as controller. This lightweight framework dynamically balances answer correctness, latency, and computation cost only rely on light statistics! 🚀 @zhengtoong @ruiliu0 @ChengsongH31219 @hongtuzhu1 @HongtuZ20093 📄 Paper: 💻 Code:
显示更多