Yun-Ta Tsai (@yunta_tsai) “Getting used to being liked likely means you are overfit to RLHF. The problem wi”

2026.06.25 03:58

Getting used to being liked likely means you are overfit to RLHF. The problem with overfitting is that the pain overwhelms the limbic system once you try to sample trajectories outside the known distribution. As more people like you, your sampling regime becomes smaller and smaller to avoid negative feedback. Eventually you get stuck and become a slave to your own feelings. That’s why I have never seen a model student happy once they become a “model”. Their weights are frozen and cannot be updated anymore. They cannot risk being better than their own SOTA.

显示更多

513

转发到社区

热门用户