注册并分享邀请链接,可获得视频播放与邀请奖励。

John Carmack (@ID_AA_Carmack) “I always lost performance when I tried to use silu/gelu activations in my RL val” — TopicDigg

John Carmack 的个人资料封面
John Carmack 的头像
John Carmack
@ID_AA_Carmack
AGI at Keen Technologies, former CTO Oculus VR, Founder Id Software and Armadillo Aerospace
加入 August 2010
285 正在关注    1.6M 粉丝
I always lost performance when I tried to use silu/gelu activations in my RL value networks, and I finally understand why. If the pre-activation values are small, the smooth curve through zero is basically a linear activation, destroying the representation power of the network. You need a batch/layer/rms norm on the preactivations to put them in the range the smooth activations are designed for. Internal norms generally hurt performance on our RL tasks, but combining them with a smooth activation at least works basically as well as a raw relu (but slower). So, not actually a win, but the lightbulb of understanding was good!
显示更多
0
37
911
40
转发到社区