A major advantage of using Keras 3 with the JAX backend: it's fast. And it's fast out of the box, without any need for careful performance optimization.
Gemma 2B inference for a single prompt runs at 116 token/s on a V100, a 3.3x increase over the HF/PT implementation.