This progress allow us to rethink global compute:
🔘 We successfully trained a 12B
@GoogleGemma model across four US regions using low-bandwidth networks
🔘 We showed we can mix different hardware generations, such as TPU6e and TPUv5p, without slowing down performance during training