注册并分享邀请链接,可获得视频播放与邀请奖励。

Andrej Karpathy (@karpathy) “@bil16238531 GPT-3 model is GPT-2 but trained for longer (300B) tokens and yes o” — TopicDigg

Andrej Karpathy 的个人资料封面
Andrej Karpathy 的头像
Andrej Karpathy
@karpathy
I like training large deep neural nets. MTS @ Anthropic. Previously Director of AI @ Tesla, founding team @ OpenAI, PhD @ Stanford.
加入 April 2009
1.1K 正在关注    2.7M 粉丝
@bil16238531 GPT-3 model is GPT-2 but trained for longer (300B) tokens and yes on a better dataset. FineWeb is a good dataset, so you can train your own like this. It will cost ~$500. Use -b 32 -t 2048 instead to use the 2048 GPT-3 context length to be accurate.
显示更多
0
7
344
24
转发到社区