Andrej Karpathy (@karpathy) “New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a complet”

Andrej Karpathy

@karpathy

I like to train large deep neural nets. Previously Director of AI @ Tesla, founding team @ OpenAI, PhD @ Stanford.

加入 April 2009

1.1K 正在关注 2.5M 粉丝

Andrej Karpathy@karpathy

2024.02.20 17:40

New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and decode() back from tokens to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI.

显示更多