注册并分享邀请链接,可获得视频播放与邀请奖励。

levi 的个人资料封面
levi 的头像

levi (@levidiamode)

@levidiamode
365 days of GPU programming ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░ 133/365
602 正在关注    4.5K 粉丝
Day 135/365 of GPU Programming Learning more about benchmarks and evals today. E.g. going through one of the lectures from last year's Transformers & Language Models at Stanford today and getting a better understanding of structured outputs, LLM as a Judge, position/verbosity bias, quantifying factuality, tool use, failure models, MMLU, AIME, PIQA, SWE Bench, HarmBench, etc
显示更多
Day 134/365 of GPU Programming Spending the day reading the papers of benchmarks I've been repeatedly seeing. Starting with MMLU, GPQA, LongBench and NoLiMa and their different iterations (v1 vs v2, standard vs pro, etc). Working on inference optimization the past few days made me realize I don't really know anything about benchmarks, so trying to become more aware of various benchmarks, their strengths and limitations. Any other benchmarks I should look into more deeply?
显示更多
Day 134/365 of GPU Programming Spending the day reading the papers of benchmarks I've been repeatedly seeing. Starting with MMLU, GPQA, LongBench and NoLiMa and their different iterations (v1 vs v2, standard vs pro, etc). Working on inference optimization the past few days made me realize I don't really know anything about benchmarks, so trying to become more aware of various benchmarks, their strengths and limitations. Any other benchmarks I should look into more deeply?
显示更多
0
0
104
6
转发到社区