注册并分享邀请链接,可获得视频播放与邀请奖励。

Andrej Karpathy (@karpathy) “New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a complet” — TopicDigg

Andrej Karpathy 的个人资料封面
Andrej Karpathy 的头像
Andrej Karpathy
@karpathy
I like to train large deep neural nets. Previously Director of AI @ Tesla, founding team @ OpenAI, PhD @ Stanford.
加入 April 2009
1.1K 正在关注    2.5M 粉丝
New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and decode() back from tokens to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI.
显示更多
0
351
13.6K
1.8K
转发到社区