Remember the llm.c repro of the GPT-2 (124M) training run? It took 45 min on 8xH100. Since then,
@kellerjordan0 (and by now many others) have iterated on that extensively in the new modded-nanogpt repo that achieves the same result, now in only 5 min!
Love this repo 👏 600 LOC