Apparently today is the 4th year anniversary of GPT-3!
Which I am accidentally celebrating by re-training the smallest model in the miniseries right now :). HellaSwag 33.7 (Appendix H) almost reached this a few steps ago (though this is only 45% of the training done).
I remember when the GPT-3 paper came out quite clearly because I had to interrupt work and go out for a walk.
The realization hit me that an important property of the field flipped. In ~2011, progress in AI felt constrained primarily by algorithms. We needed better ideas, better modeling, better approaches to make further progress. If you offered me a 10X bigger computer, I'm not sure what I would have even used it for. GPT-3 paper showed that there was this thing that would just become better on a large variety of practical tasks, if you only trained a bigger one. Better algorithms become a bonus, not a necessity for progress in AGI. Possibly not forever and going forward, but at least locally and for the time being, in a very practical sense. Today, if you gave me a 10X bigger computer I would know exactly what to do with it, and then I'd ask for more. It's this property of AI that also gets to the heart of why NVIDIA is a 2.8T company today. I'm not sure how others experienced it, but the realization convincingly clicked for me with GPT-3, 4 years ago.
显示更多