@r_chirra I fixed it :) deployed live now. This was done by doing a round of synthetic data generation to collect a 1000 multi-turn conversations (given a bunch of information including the readme of the nanochat project), and then mixing that into midtraining and SFT. fun!