XLA-compiled TF delivers the best performance (better than the previous tf.keras based version!) -- JAX is very close as well.
PyTorch isn't compiled so it runs slower, but it also has a better cold-start time for the same reason (no compilation overhead for the 1st batch).