JAX is fast ⏩
Benchmarking Mistral 7B inference on a V100 (float16, batch_size 10): the throughput of the KerasNLP implementation with JAX is over 2x higher than the Hugging Face PyTorch one (compiled).
Worth noting that this is "out of the box" performance: the KerasNLP model is not optimized for performance. It's written the way anyone would naively write a Keras 3 LLM.