Runpeng Dai (@RunpengDai) “What if we model test time adaptive sampling as MDP? In our recent work, RL-Guid”

Runpeng Dai

@RunpengDai

Third-year Ph.D. candidate at @UNC. Intern at @Apple, previously @Tencent AI Labs. My research sits at the intersection of Reinforcement Learning and LLM.

加入 September 2025

45 正在关注 24 粉丝

Runpeng Dai@RunpengDai

2026.06.03 04:05

What if we model test time adaptive sampling as MDP? In our recent work, RL-Guided Adaptive sampling, we model the test time sampling as a MDP. Then we train a 4-layer MLP on CPU as controller. This lightweight framework dynamically balances answer correctness, latency, and computation cost only rely on light statistics! 🚀 @zhengtoong @ruiliu0 @ChengsongH31219 @hongtuzhu1 @HongtuZ20093 📄 Paper: 💻 Code:

显示更多