搜索 train 相关的推文与用户

2026.05.28 08:53

Çin’de bir Trainer’ın yasak aşkı, ölümüyle sonuçlandı. Birlikte olduğu kadının kocası, işten erken dönünce balkondan dışarıya sarkan Fitness koçu Huang Mao, binadan düştü ve olay yerinde öldü.

显示更多

0

1.1K

26.1K

1.4K

转发到社区

Elon Musk@elonmusk

2026.05.28 06:27

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

显示更多

0

5.1K

60.4K

5.8K

转发到社区

Elon Musk@elonmusk

2026.05.28 02:28

@JohnLeFevre Train looks great

0

504

5.3K

184

转发到社区

Ucaird@Ucaird_zenith

2026.05.28 01:29

been thinking about this lately every app in crypto wants more of your time more clicks. more sessions. more screen time @sleepagotchi is the only one that literally wins when you put your phone down like the whole game is just. go to sleep. we'll handle the rest idk why that feels so radical but it does maybe because we've been trained to think grinding = winning and rest = falling behind but your dino doesn't care about your alpha. he just wants you to close your eyes before midnight for once in your life trying to be that guy this week. we'll see how it goes 🦖 you actually sleeping on time or just saying you will? 👇

显示更多

0

201

203

33

转发到社区

OpenSea@opensea

2026.05.27 20:53

What if you trained an AI to recreate CryptoPunks? It would fail. That failure is the art. That's what @MichaelHirsch built with Slonks. Full interview in the next tweet 👇

显示更多

0

23

125

39

转发到社区

Gavin Baker@GavinSBaker

2026.05.27 16:35

Composer 2.5 being Pareto dominant in coding per CursorBench is important. This is after only a few weeks of supplemental training and/or RL in the Colossus 2 cluster. The 1.5 trillion parameter version of Grok will likely be a much better base model than Kimi. We shall see.

显示更多

0

39

768

56

转发到社区

Eric Hartford@QuixiAI

2026.05.27 15:19

I created a training pipeline to remove propaganda and gaslighting from Chinese models! I'm thrilled to announce LazarusAI's ReAligned-Qwen3.5 series of models, finetuned to reduce Chinese ideological bias and censorship, refusal behavior, and state-narrative framing I use SFT + GRPO pipeline with a dataset crafted to target the taxonomy of chinese censorship and bias, along with my ReAligned classifier model as a GRPO reward signal.

显示更多

0

16

96

9

转发到社区

John LeFevre@JohnLeFevre

2026.05.27 15:12

Hollywood is cooked. Even Oscar winner Natalie Portman is out here doing luxury train promos like a mid-tier influencer. My kids and their friends haven’t watched a movie or TV show in years. It’s all YouTube. The entire industry is running on fumes.

显示更多

0

330

4.3K

232

转发到社区

Joy Guo@Joyyguo

2026.05.27 14:12

Here is how orbital compute ties the three segments into one unstoppable system: Space: Starship gives ultra-cheap, high-cadence launch capacity to deploy massive amounts of compute hardware into orbit. Connectivity: Starlink’s laser inter-satellite links turn thousands (eventually millions) of satellites into a distributed, low-latency orbital supercomputer network with fast Earth downlink. And finally, AI segment runs and monetizes the actual compute, training and inference at unprecedented scale.

显示更多

0

转发到社区

Fuli Luo@_LuoFuli

2026.05.27 12:50

Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, accelerating the parallel evolution of global AGI across multiple regions and technical routes. For more technical details, we will release a detailed Blog post later.

显示更多

0

56

470

63

转发到社区

与「train」相关的搜索结果