注册并分享邀请链接,可获得视频播放与邀请奖励。

Andrej Karpathy (@karpathy) “Example here is the llm.c GPT-3 (124M) training on FineWeb (figure cropped at 25” — TopicDigg

Andrej Karpathy 的个人资料封面
Andrej Karpathy 的头像
Andrej Karpathy
@karpathy
I like training large deep neural nets. MTS @ Anthropic. Previously Director of AI @ Tesla, founding team @ OpenAI, PhD @ Stanford.
加入 April 2009
1.1K 正在关注    2.7M 粉丝
Example here is the llm.c GPT-3 (124M) training on FineWeb (figure cropped at 250B tokens), we seem to surpass GPT-3 HellaSwag (green line) at ~150B tokens, per paper expected this to be at 300B tokens. Will re-run with FineWeb-Edu. I do want to be a bit careful on conclusions though because HellaSwag is just one eval, mostly targeting English sentences and a multiple choice of their likely continuations in "tricky" settings. It may be that the GPT-2/3 datasets were a lot broader (e.g. more multilingual than FineWeb, or a lot more math/code than FineWeb, etc.). So it's likely we want to expand the set of evals to make more confident statements and comparisons.
显示更多
0
9
393
20
转发到社区