The United States did not attend the G20 in South Africa, because the South African Government refuses to acknowledge or address the horrific Human Right Abuses endured by Afrikaners, and other descendants of Dutch, French, and German settlers. To put it more bluntly, they are killing white people, and randomly allowing their farms to be taken from them. Perhaps, worst of all, the soon to be out of business New York Times and the Fake News Media won’t issue a word against this genocide. That’s why all the Liars and Pretenders of the Radical Left Media are going out of business! At the conclusion of the G20, South Africa refused to hand off the G20 Presidency to a Senior Representative from our U.S. Embassy, who attended the Closing Ceremony. Therefore, at my direction, South Africa will NOT be receiving an invitation to the 2026 G20, which will be hosted in the Great City of Miami, Florida next year. South Africa has demonstrated to the World they are not a country worthy of Membership anywhere, and we are going to stop all payments and subsidies to them, effective immediately. Thank you for your attention to this matter!
[gif] me trying to read tinygrad code earlier :D
I think the LOC requirements (which are only a proxy for simplicity) led to too great compression. You wouldn't brag about your .min.js code being 1 LOC. Imo it would be a lot more simple if the code was given room to breathe and some comments. The optimization should be: minimize LOC subject to constraint that the code is clean. Nothing that can't be fixed, too.
RE code using (aside from reading), happy to consider it and work with it as a baseline on the side of PyTorch when it reaches 1.0. I've used PyTorch for many years so it's easy to go to for a strong baseline.
Btw based on some comments it's worth clarifying that llm.c repo and TinyGrad repo are very different kinds of pokemons. We both want to train LLMs fast. TinyGrad wants to be an actual compiler (think: gcc) - take high-level descriptions of arbitrary networks and compile them to run fast on different backends. llm.c is more like a direct, assembly-level program, written by hand, for a very specific, narrow program (GPT-2 training loop). Unlike your typical assembly program though, you get something low level but still readable. Compilers will struggle to produce this, even if they may match or surpass the running time. It's not usually a goal of a compiler to produce readable code.
So there are two ways to generate really fast code:
1) write a better compiler
2) write a better assembly-level program
At the end of the day it can be both. (2) is really fun to write and you're in complete control. And any optimizations that get done by hand can help improve and challenge (1) to emit them as a special case when appropriate. Also, (1) may find and emit optimizations that could be extremely tedious to do by hand. And of course the moment you want to do something different, you'll have a lot easier time with (1) over (2).
One more radical and possibly under-appreciated thought that may turn out to be wrong but I think has a decent chance to be right. I think LLMs are going to become very good "compilers" and will be capable of directly emitting excellent assembly-level programs. Code like llm.c (and descendants) could one day be a part of a few-shot prompt, to help the LLM compile the n+1 program.