Andrej Karpathy (@karpathy) “@bil16238531 GPT-3 model is GPT-2 but trained for longer (300B) tokens and yes o”

Andrej Karpathy

@karpathy

I like training large deep neural nets. MTS @ Anthropic. Previously Director of AI @ Tesla, founding team @ OpenAI, PhD @ Stanford.

加入 April 2009

1.1K 正在关注 2.7M 粉丝

Andrej Karpathy@karpathy

2024.05.30 01:41

@bil16238531 GPT-3 model is GPT-2 but trained for longer (300B) tokens and yes on a better dataset. FineWeb is a good dataset, so you can train your own like this. It will cost ~$500. Use -b 32 -t 2048 instead to use the 2048 GPT-3 context length to be accurate.

显示更多