注册并分享邀请链接,可获得视频播放与邀请奖励。

与「D10」相关的搜索结果

D10 贴吧
一个关键词就是一个贴吧,路径全站唯一。
创建贴吧
用户
未找到
包含 D10 的内容
#December10# さんとコラボした日の裏側👀 ありがとうございました✨ TikTok🔗 Instagram🔗 #超十代# #D10# #NMB48# #青春のデッドライン#
显示更多
0
3
671
96
转发到社区
わたしを彼女にするメリット⬇️ ・身長が135cmしかない ・でもおむねはMかぷある ・顔がめっちゃかわいい ・おちりも110cmある ・えちなコスプレしてくれる ・一緒にゲームとアニメを楽しめる ・男の気持ちを分かっている ・付き合ったらめっちゃ一途 デメリット⬇️
显示更多
0
24
1.2K
37
转发到社区
I BEG YOU AH MAKE CYBERSTAN AS BULLSHIT AS DAY ONE OSHAUNE BOT FRONT D10 IS TOO FUCKING EASY FOR THE LONGEST TIME EVEN INC CORP PALES IN COMPARISON TO BUGS PREDATOR STRAIN AND MFS HAS BEEN GLAZING BOT FRONT LIKE NO END SO MAYBE ITS TIME TO ACTUALLY MAKE BOTS HARD AGAIN RAAH
显示更多
0
17
401
31
转发到社区
New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale: The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models. TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work. Full post with a lot more detail is here: And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.
显示更多
0
227
5.4K
675
转发到社区
色々な方から🎁続々届いて嬉しい(´。✪ω✪。 ` ) 写真はあまえちゃん♡♡♡ 大好き!!!!! コラボしようね(((o(*゚▽゚*)o)))
プラチナジャケットリドル🌹⟡.·*. #39Dハロ# #398のしゃめ# #Dハロ仮装2024# #Dハロ仮装# 女王様とランドいます🏰♥️ (ツイステ/twst)D100衣装でイン🫶
显示更多
𝙉𝙚𝙬 𝘼𝙧𝙧𝙞𝙫𝙖𝙡  ̄ ̄ ̄ ̄ ̄ ベール付きのセクシーなハイレグシスターコスプレです🖤 🫱
0
0
73
12
转发到社区
#D100twstCosplay0608# 昨日は鹿さん主催のD100プラチナ衣装合わせにトレイで参加してきました♣️✨️ 22人揃って圧巻✨️ カラフルはっぴーハーツラビュル🫶 #398のしゃめ# クレジットはリプにて🫶
显示更多
0
3
296
44
转发到社区