注册并分享邀请链接，可获得视频播放与邀请奖励。

立即注册

levi 的头像

levi (@levidiamode)

@levidiamode

365 days of GPU programming ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░ 133/365

602 正在关注 4.5K 粉丝

levi@levidiamode

2026.05.18 22:52

Day 135/365 of GPU Programming Learning more about benchmarks and evals today. E.g. going through one of the lectures from last year's Transformers & Language Models at Stanford today and getting a better understanding of structured outputs, LLM as a Judge, position/verbosity bias, quantifying factuality, tool use, failure models, MMLU, AIME, PIQA, SWE Bench, HarmBench, etc

显示更多

levi@levidiamode

2026.05.17 22:23

Day 134/365 of GPU Programming Spending the day reading the papers of benchmarks I've been repeatedly seeing. Starting with MMLU, GPQA, LongBench and NoLiMa and their different iterations (v1 vs v2, standard vs pro, etc). Working on inference optimization the past few days made me realize I don't really know anything about benchmarks, so trying to become more aware of various benchmarks, their strengths and limitations. Any other benchmarks I should look into more deeply?

显示更多

0

0

1

40

4

转发到社区

levi@levidiamode

2026.05.17 22:23

Day 134/365 of GPU Programming Spending the day reading the papers of benchmarks I've been repeatedly seeing. Starting with MMLU, GPQA, LongBench and NoLiMa and their different iterations (v1 vs v2, standard vs pro, etc). Working on inference optimization the past few days made me realize I don't really know anything about benchmarks, so trying to become more aware of various benchmarks, their strengths and limitations. Any other benchmarks I should look into more deeply?

显示更多

0

0

0

104

6

转发到社区

热门用户

@aleabitoreddit

491.2K 粉丝

45.1M 粉丝

6.3M 粉丝

@YGBABYMONSTER_

858.8K 粉丝

BTS JAPAN OFFICIAL

@BTS_jp_official

13.7M 粉丝

1.2M 粉丝

ポケモン公式

2.9M 粉丝

BABYMONSTER JAPAN OFFICIAL

@_BABYMONSTER_JP

191.9K 粉丝

TWICE JAPAN OFFICIAL

@JYPETWICE_JAPAN

3.5M 粉丝

12.4M 粉丝

@ENHYPEN_members

13.8M 粉丝

8.8M 粉丝

916.2K 粉丝

3.9M 粉丝

22/7(ナナブンノニジュウニ)

64.2K 粉丝