Then it defines a Transformer encoder, which is your usual Transformer block, as well as a Transformer decoder, which is also your usual Transformer block, but with causal attention to prevent later timesteps to influence the decoding of earlier timesteps.