[gif] me trying to read tinygrad code earlier :D
I think the LOC requirements (which are only a proxy for simplicity) led to too great compression. You wouldn't brag about your .min.js code being 1 LOC. Imo it would be a lot more simple if the code was given room to breathe and some comments. The optimization should be: minimize LOC subject to constraint that the code is clean. Nothing that can't be fixed, too.
RE code using (aside from reading), happy to consider it and work with it as a baseline on the side of PyTorch when it reaches 1.0. I've used PyTorch for many years so it's easy to go to for a strong baseline.
Btw based on some comments it's worth clarifying that llm.c repo and TinyGrad repo are very different kinds of pokemons. We both want to train LLMs fast. TinyGrad wants to be an actual compiler (think: gcc) - take high-level descriptions of arbitrary networks and compile them to run fast on different backends. llm.c is more like a direct, assembly-level program, written by hand, for a very specific, narrow program (GPT-2 training loop). Unlike your typical assembly program though, you get something low level but still readable. Compilers will struggle to produce this, even if they may match or surpass the running time. It's not usually a goal of a compiler to produce readable code.
So there are two ways to generate really fast code:
1) write a better compiler
2) write a better assembly-level program
At the end of the day it can be both. (2) is really fun to write and you're in complete control. And any optimizations that get done by hand can help improve and challenge (1) to emit them as a special case when appropriate. Also, (1) may find and emit optimizations that could be extremely tedious to do by hand. And of course the moment you want to do something different, you'll have a lot easier time with (1) over (2).
One more radical and possibly under-appreciated thought that may turn out to be wrong but I think has a decent chance to be right. I think LLMs are going to become very good "compilers" and will be capable of directly emitting excellent assembly-level programs. Code like llm.c (and descendants) could one day be a part of a few-shot prompt, to help the LLM compile the n+1 program.
显示更多