This sort of unusual dataset makes a great showcase for the power of TFData :) All-functional chaining, filters operations, super readable, etc.
And you get something ultra-performant that compiles to parallel C++ (no Python at all at runtime)
Here's my randomized pipeline...