注册并分享邀请链接,可获得视频播放与邀请奖励。

Nous Research (@NousResearch) “Today we release Token Superposition Training (TST), a modification to the stand” — TopicDigg

Nous Research 的个人资料封面
Nous Research 的头像
Nous Research
@NousResearch
World-class open source AI
加入 October 2020
24 正在关注    172.9K 粉丝
Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.
显示更多
0
144
3.6K
405
转发到社区