搜索 Baby_Its_Both 相关的推文与用户

2024.11.26 02:00

Can't stop grooving to 'Baby It’s Both (Tick-Tack English Ver.) Feat. Ava Max' 💃🏻🕺🏾 with @applemusic #FridayFeeling# 🪩 💿 #ILLIT# #아일릿# #I_LL_LIKE_YOU# #Baby_Its_Both# #Tick_Tack#

显示更多

0

4

1.1K

182

转发到社区

Nic0le@nicole_clash

2026.06.02 00:53

Today I am announcing META-Bench, the first pure intelligence benchmark for AI. It leverages the hit auto-battler strategy game, TFT. I SWEAR I AM NOT TROLLING let me explain. The industry suffers from labs overfitting and giving us models that score high despite being fundamentally low IQ. Over the years there have been many attempts at benchmarking AI with competitive gaming. I am going to explain the failure points, and why META-Bench is truly the first of its kind. Chess. When picking a game to benchmark with, chess is the obvious first choice. It has clear rules, large player base, and a well defined elo system. The issue with static rule games though is that the best strategies can be figured out ahead of time and baked into the model during the training process. Too easily hacked. Memorizing more strategies is not a proof of intelligence. Dota2/ League. We’ve all heard of OpenAI Five. The issue with benchmarking on a MOBA is that reaction speed is a meaningless metric. We do not need our highly intelligent AI to be able to respond at the speed of top human pro players. And truth be told, we are years away from a LLM that is able to play MOBAs at the highest levels off of vision alone, even though the problem is seemingly solved years ago. What we need is a game that: - Has defined rules but cannot be results hacked during the training process - Large ecosystem of human players - Clear cut results and an elo system - Results that is not reaction time dependent There is only ONE game in the world that meet all the requirements needed for this benchmark. Teamfight Tactics. For those unfamiliar, TFT is a strategy based auto-battler created by Riot Games with ~100 million monthly active players worldwide. It is a highly competitive multiplayer turn based game. It’s as if Chess and League of Legends had a baby that’s born to be an AI benchmark: - There is a new set released every 3 months. - Time limitations in the 10-40 second range rather than the milliseconds required for MOBAs - Skill based enough for esports yet uncertain enough to require reasoning over hard scripts “Can’t labs just train models to be good at TFT?” Nope and the reason why it’s unhackable comes down to how the benchmark itself is set up. Due to the fact that the entire game is changed every 3 months and patched every 2 weeks, any data on a previous TFT set is effectively useless when it comes to raw pattern recognition. Strategy wise, there are core concepts that carries over from set to set. That’s why we have the same players hitting the highest elo every season even though each set is so different. Any efforts at overfitting here can be fully negated if the benchmark harness used for all models has every core strategy built in. You are never going to beat a carefully curated harness layer with strategy training at the model layer. By presenting the models in the harness with the same core strategic concepts, the only difference in outputs will be its ability to reason across the different scenarios of each game. The luck elements of TFT already ensures that no 2 games will be the same in the reasoning required. Run the models against each other enough times and you will have a clear winner. Aka, the world’s first true IQ test for AI. I really, really want to know which AI model would win this. So I am going to build this. Not too sure how I’m going to fund it yet so if you would like to invest HMU. I’m also looking to put together a small team of individuals who are both high elo in TFT and highly experienced with agentic AI. And if you are even remotely curious on the results, like and help share this post 🫡

显示更多

0

9

38

1

转发到社区

Andrej Karpathy@karpathy

2026.01.07 23:01

New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale: The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models. TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work. Full post with a lot more detail is here: And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.

显示更多

0

227

5.4K

675

转发到社区

ILLIT Official Japan@ILLITjpofficial

2024.11.22 05:10

#ILLIT「Baby# It’s Both (Tick-Tack English Ver.) Feat. Ava Max」の配信がスタートしました✨ ぜひたくさん聴いてください💚 🎧 #I_LL_LIKE_YOU# #Tick_Tack#

显示更多

0

4

1.4K

226

转发到社区

Theodora Moutinho@teddybearosito

2024.02.06 06:41

baby its cold outside ❄️

0

25

2.8K

118

转发到社区

Joselyine Chap❤️@joselyinechap

2026.06.07 15:57

American influencer Sam Jones risks being expelled from Australia after being filmed taking a baby wombat from its distressed mother for her videos.

0

104

365

45

转发到社区

Wolf@WolfprwX

2026.06.11 17:43

Monkey saves baby rabbit & its mother.

0

96

3.8K

203

转发到社区

Pirat_Nation 🔴@Pirat_Nation

2026.05.28 15:07

Temu has been fined $232 million by the European Union after regulators found unsafe and illegal products being sold on the platform. The investigation uncovered items including baby toys with banned chemicals and phone chargers that failed basic safety tests. Officials said some products could pose serious risks to consumers, especially children. EU regulators said the problem went beyond a few bad products and accused Temu of not doing enough to monitor sellers or remove unsafe items. Temu said it disagrees with the decision but has already improved its safety and compliance systems.

显示更多

0

42

747

66

转发到社区

CZ 🔶 BNB@cz_binance

2023.11.21 20:36

Today, I stepped down as CEO of Binance. Admittedly, it was not easy to let go emotionally. But I know it is the right thing to do. I made mistakes, and I must take responsibility. This is best for our community, for Binance, and for myself. Binance is no longer a baby. It is time for me to let it walk and run. I know Binance will continue to grow and excel with the deep bench it has. I’m pleased to announce that @_RichardTeng, our now former Global Head of Regional Markets, has been named the new CEO of Binance today. Richard is a highly qualified leader and, with over three decades of financial services and regulatory experience, he will navigate the company through its next period of growth. He will ensure Binance delivers on our next phase of security, transparency, compliance, and growth. Prior to joining Binance, Richard was CEO of the Financial Services Regulatory Authority at Abu Dhabi Global Market (ADGM); Chief Regulatory Officer of the Singapore Exchange (SGX); and Director of Corporate Finance in the Monetary Authority of Singapore. With Richard and the entire team, I’m confident that the best days for @Binance and the crypto industry lay ahead. As a shareholder and former CEO with historical knowledge of our company, I will remain available to the team to consult as needed, consistent with the framework set out in our U.S. agency resolutions. What’s next for me? I will take a break first. I have not had a single day of real (phone off) break for the last 6 and half years. After that, my current thinking is I will probably do some passive investing, being a minority token/shareholder in startups in areas of blockchain/Web3/DeFi, AI and biotech. I am happy that I will finally have more time to spend looking at DeFi. I can’t see myself being a CEO driving a startup again. I am content being an one-shot (lucky) entrepreneur. Should there be listeners, I may be open to being a coach/mentor to a small number of upcoming entrepreneurs, privately. If for nothing else, I can at least tell them what not to do. On that note, I am proud to point out that in our resolutions with the U.S. agencies they: - do not allege that Binance misappropriated any user funds, and - do not allege that Binance engaged in any market manipulation. Funds are SAFU! With that, I look forward to seeing the new leadership take the reins. Please join me in congratulating Richard on his well-deserved promotion. Onwards! CZ

显示更多

0

36.2K

145.8K

31K

转发到社区

何維健 Derrick Hoh@derrickhoh

2022.07.06 12:54

The last 30 weeks is quite frankly the most exhilarating chapter of my life. It is not without its challenges... Just hope Jellies and Baby J is healthy as they can be. Swipe through to see Baby J's progress and 3D scan is at the end. #expectantdad# #babyultrasound# #3dultrasound#

显示更多

0

1

6

0

转发到社区

与「Baby_Its_Both」相关的搜索结果