Day 134/365 of GPU Programming
Spending the day reading the papers of benchmarks I've been repeatedly seeing.
Starting with MMLU, GPQA, LongBench and NoLiMa and their different iterations (v1 vs v2, standard vs pro, etc).
Working on inference optimization the past few days made me realize I don't really know anything about benchmarks, so trying to become more aware of various benchmarks, their strengths and limitations.
Any other benchmarks I should look into more deeply?
显示更多