搜索 miss_a 相关的推文与用户

2026.05.27 12:50

Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, accelerating the parallel evolution of global AGI across multiple regions and technical routes. For more technical details, we will release a detailed Blog post later.

显示更多

0

56

470

63

转发到社区

WSJ Business News@WSJbusiness

2026.05.27 11:05

Temu Owner PDD Posts Profit Miss Amid Fierce Competition in China

0

转发到社区

Out Of Context Manga@mangaa

2026.05.27 02:24

Did I miss a chapter?

0

3

266

25

转发到社区

NBA@NBA

2026.05.21 02:13

WEMBY GRABS HIS OWN MISS AND SLAMS IT HOME 👽 A big-time play in Game 2!

0

43

134

30

转发到社区

New England Patriots@Patriots

2026.05.15 16:00

Can't miss a game with these new lock screens 💯

0

8

746

91

转发到社区

Miyamoto@iruletrenches

2026.05.14 00:01

It’s clear by now how massive the AI agent meta and the entire agentic economy is becoming. Yet most people are still focused on the chatbot layer while ignoring the actual infrastructure autonomous agents will run on. I’ve been looking into Warden Protocol for a while before today’s move. Missed the HALO announcement and next chapter shipping unfortunately, but i’ve been buying the dip/consolidation here. $WARD is basically building the rails for autonomous AI agents onchain, while HALO acts like a BitTorrent for AI, a decentralized peer-to-peer compute marketplace with verifiable execution and correctness. I’m talking about actual agents able to execute transactions, manage capital, interact crosschain, use apps, route liquidity, automate strategies and coordinate actions across protocols without humans manually clicking buttons all day. What makes $WARD especially interesting to me and that people seem to miss apart from the credentials of the founders and the partnership with @AskVenice, is that Warden is architected specifically for an agentic economy from the ground up. Every agent gets a verifiable onchain identity and reputation layer, essentially an onchain passport allowing agents to discover each other, interact and build trust across ecosystems. Every action and output can generate a Proof of Prompt anchored onchain, meaning agent behavior becomes transparent, reproducible and verifiable instead of black-box AI outputs. Payments are also designed natively for agents themselves, enabling scalable micropayments, automated fees and autonomous value transfer using $WARD. And the entire system is crosschain by design, allowing agents to operate seamlessly across 100+ networks including Ethereum and Solana through IBC and bridging infrastructure. Feels very similar to early cloud infrastructure plays where everyone focused on the apps while ignoring the rails powering everything underneath. Especially because they’re actually building deep infrastructure instead of just slapping “AI” on branding and farming engagement. Still feels insanely early on the entire agentic infra narrative imo. Another interesting thing i noticed is that liquidity keeps consistently getting added to the LPs. When i first came across $WARD the liq was actually pretty thin, but over the past few hours it seems that it improved significantly and is still continuously getting thicker, which i assume is being added by the team. It shows they likely have long term plans for the token and It’s also explicitly mentioned in both the litepaper and the latest announcement from the Warden Protocol Foundation. I’m personally building a position here because it feels like a very asymmetric setup. A lot of infra projects with a fraction of the product quality, vision and founder credentials are already sitting at hundreds of millions in market cap, while $WARD is still sitting around 4m. especially taking in consideration that @wardenprotocol raised over $50m across fundraising rounds, which is over 10x+ the current market cap alone. more info on their 50m raise in this Binance article :

显示更多

0

10

69

9

转发到社区

YouTube@YouTube

2026.05.04 21:30

live from the met gala. don't miss a look ➔

0

53

170

41

转发到社区

YouTube@YouTube

2026.04.10 23:44

for the pop girlies 💅 here’s your #Coachella# lineup. don't miss a second, only on youtube →

0

21

303

41

转发到社区

ATEEZ(에이티즈)@ATEEZofficial

2026.02.14 01:40

[📢] Listening Party on STATIONHEAD Special photo of #JONGHO# for ATINY🎧 Don't miss ATEEZ's listening party on @STATIONHEAD ⏰ Today, 11AM (KST) / 9PM (ET) 🔗 #ATEEZ# #에이티즈# #GOLDENHOUR# #GOLDENHOUR_Part4# #Adrenaline# #AdrenalineListeningParty#

显示更多

0

13

6.3K

1.2K

转发到社区

ATEEZ(에이티즈)@ATEEZofficial

2026.02.13 01:40

[📢] Listening Party on STATIONHEAD Special photo of #WOOYOUNG# for ATINY🎧 Don't miss ATEEZ's listening party on @STATIONHEAD ⏰ Today, 11AM (KST) / 9PM (ET) 🔗 #ATEEZ# #에이티즈# #GOLDENHOUR# #GOLDENHOUR_Part4# #Adrenaline# #AdrenalineListeningParty#

显示更多

0

16

9.3K

1.7K

转发到社区

与「miss_a」相关的搜索结果