搜索 Transformers 相关的推文与用户

2026.06.04 05:21

Jensen Huang just handed every AI cloud investor the clearest framework for picking winners and the question is who actually understands what he said (Save this). Compute is not just infrastructure anymore but rather revenue, and performance per watt is the mechanism by which that revenue becomes profit. The argument Jensen made at Computex deserves to be unpacked fully because it completely reframes the neocloud investment thesis. Every AI factory operates inside a fixed power envelope and once your data center is built and your power contracts are signed, that ceiling does not move. One gigawatt means one gigawatt and the only variable that determines how much money you make is how many profitable tokens you can squeeze out of each watt of electricity flowing through your facility. An operator who chooses cheaper, lower efficiency chips because the upfront cost looks attractive is not saving money and they are permanently handicapping their revenue ceiling for the life of that asset. Every watt that produces fewer tokens is a watt that will never recover those lost revenues, for as long as that infrastructure runs. Jensen's second point is about asset longevity and it is equally important to understand. AI software is evolving every few months from CNNs to Transformers to Mixture of Experts to agentic systems and that pace is not slowing down. A hardware architecture that cannot adapt to new software paradigms has a short useful life, and a short useful life means a high total cost of ownership. Infrastructure built on Nvidia's CUDA ecosystem has a built in software longevity advantage because every new model, framework, and optimization is written for CUDA first. Now apply that framework directly to Nebius, which is the most important stock in the neocloud category. Nebius built its entire infrastructure around full Nvidia integration from the ground up. Nvidia and Nebius announced a formal strategic partnership in March 2026 specifically to develop the next generation of hyperscale AI cloud deployments together. Nebius is already offering Blackwell Ultra GB300 NVL72-powered instances to customers, meaning it has the highest-performance GPU currently available commercially running inside its own infrastructure. The token economics follow directly from the architecture. Contracted power has now passed 3.5 gigawatts, with more than 75% of that capacity owned outright rather than leased. The Meta deal alone is worth $27 billion over five years, and the Microsoft agreement is worth up to $19.4 billion. The 2026 plan targets 480 megawatts of live AI cloud capacity, 150,000 GPUs deployed, and $3.7 billion in annualized revenue implying next twelve month revenue growth of roughly 489%. Q1 2026 revenue was $399 million, up 684% year-over-year, and the CEO said on the earnings call that everything Nebius builds gets sold immediately. Fully booked capacity at an AI cloud running Nvidia's best hardware, inside a power-scarce environment where performance per watt is the direct driver of profitability, means Nebius's revenue ceiling moves in direct proportion to the power it can bring online. CoreWeave, a direct comparable, trades at a materially higher multiple on a smaller contracted power base. Nebius owns more of its capacity outright, has a longer-dated and larger contract backlog on a per-gigawatt basis, and is growing revenue at a faster rate. Milk road remains extremely bullish on Nebius and come join Milk Road Pro and get our full Nebius positioning breakdown and our other AI trades for just a dollar. Link down below!

显示更多

0

3

41

13

转发到社区

Victor Renard@VictorRenajlj

2026.05.26 01:50

This company just took news headlines by storm - Jacob Amsterdam just joined its Advisory Board. WHY? Copper recently traded above $14,000 per ton, near historic highs, as demand from AI infrastructure continues accelerating. Just one electric vehicle can use more than 80 kilograms of copper. Some estimates suggest a single large-scale AI data center can require up to 50,000 tons of copper across power systems, transformers, cooling infrastructure, wiring, and grid connections. AI data centers, robots, military systems, renewable energy, and EV production are all increasing global copper demand at the same time. Entire countries are upgrading electrical infrastructure simultaneously. And while demand rises, new copper mines can take more than a decade to develop. That is where NovaRed Mining enters the story. Its Wilmac Copper-Gold Project spans nearly three times the size of Manhattan across prospective copper-gold terrain in British Columbia. As the global race for copper intensifies, companies connected to future copper supply are attracting increasing investor attention. Always do your own research. This is not financial advice.

显示更多

0

343

12.1K

1.7K

转发到社区

千寻｜AI 分享 🌸@Crypto_QianXun

2026.05.22 00:05

40个真正有用的GitHub仓库 1. public-apis — 免费API合集 2. build-your-own-x — 边做边学 3. developer-roadmap — 学任何技术 4. free-programming-books — 免费书籍 5. system-design-primer — 掌握系统设计 6. coding-interview-university — 自学计算机 7. the-art-of-command-line — 精通终端 8. project-based-learning — 项目式学习 9. you-dont-know-js — 深入学JavaScript 10. the-book-of-secret-knowledge — 黑客资源 11. tech-interview-handbook — 面试通关 12. awesome-selfhosted — 自建应用 13. javascript-algorithms — 可视化算法 14. 30-seconds-of-code — 实用代码片段 15. gitignore — 各语言模板 16. ollama — 本地运行AI模型 17. langchain — 快速构建AI应用 18. n8n — AI自动化工作流 19. openclaw — 本地AI助手 20. dify — 可视化创建AI代理 21. langflow — 拖拽式AI管道 22. mem0 — AI代理记忆层 23. browser-use — AI控制浏览器 24. ruflo — Claude代理编排 25. crewai — 多代理AI团队 26. hermes-agent — 开源AI代理 27. markitdown — 文件转Markdown 28. maigret — 3000+网站OSINT 29. open-webui — 自建ChatGPT界面 30. aider — 终端AI编程助手 31. agency-agents — 完整AI代理机构 32. tradingagents — 交易多代理框架 33. browserbase-skills — Claude网页SDK 34. autogen — 微软多代理框架 35. metagpt — AI代理软件公司 36. lobe-hub — 可视化多代理平台 37. huggingface-transformers — 现代AI基础 38. cocoindex — 长文本代理引擎 39. freeCodeCamp — 免费编程学习 40. stable-diffusion-webui — 本地AI画图大多数开发者一个都没保存。聪明人保存了全部40个。

显示更多

0

1

0

转发到社区

无颜@WY_mask

2026.05.16 01:00

兄弟们，40个有用的GitHub仓库，强烈建议收藏起来！ 1. public-apis — 免费API合集 2. build-your-own-x — 边做边学 3. developer-roadmap — 学任何技术 4. free-programming-books — 免费书籍 5. system-design-primer — 掌握系统设计 6. coding-interview-university — 自学计算机 7. the-art-of-command-line — 精通终端 8. project-based-learning — 项目式学习 9. you-dont-know-js — 深入学JavaScript 10. the-book-of-secret-knowledge — 黑客资源 11. tech-interview-handbook — 面试通关 12. awesome-selfhosted — 自建应用 13. javascript-algorithms — 可视化算法 14. 30-seconds-of-code — 实用代码片段 15. gitignore — 各语言模板 16. ollama — 本地运行AI模型 17. langchain — 快速构建AI应用 18. n8n — AI自动化工作流 19. openclaw — 本地AI助手 20. dify — 可视化创建AI代理 21. langflow — 拖拽式AI管道 22. mem0 — AI代理记忆层 23. browser-use — AI控制浏览器 24. ruflo — Claude代理编排 25. crewai — 多代理AI团队 26. hermes-agent — 开源AI代理 27. markitdown — 文件转Markdown 28. maigret — 3000+网站OSINT 29. open-webui — 自建ChatGPT界面 30. aider — 终端AI编程助手 31. agency-agents — 完整AI代理机构 32. tradingagents — 交易多代理框架 33. browserbase-skills — Claude网页SDK 34. autogen — 微软多代理框架 35. metagpt — AI代理软件公司 36. lobe-hub — 可视化多代理平台 37. huggingface-transformers — 现代AI基础 38. cocoindex — 长文本代理引擎 39. freeCodeCamp — 免费编程学习 40. stable-diffusion-webui — 本地AI画图

显示更多

0

22

798

237

转发到社区

小樱💞｜实用工具分享@xiaoying_eth

2026.05.15 00:13

40个真正有用的GitHub仓库 1. public-apis — 免费API合集 2. build-your-own-x — 边做边学 3. developer-roadmap — 学任何技术 4. free-programming-books — 免费书籍 5. system-design-primer — 掌握系统设计 6. coding-interview-university — 自学计算机 7. the-art-of-command-line — 精通终端 8. project-based-learning — 项目式学习 9. you-dont-know-js — 深入学JavaScript 10. the-book-of-secret-knowledge — 黑客资源 11. tech-interview-handbook — 面试通关 12. awesome-selfhosted — 自建应用 13. javascript-algorithms — 可视化算法 14. 30-seconds-of-code — 实用代码片段 15. gitignore — 各语言模板 16. ollama — 本地运行AI模型 17. langchain — 快速构建AI应用 18. n8n — AI自动化工作流 19. openclaw — 本地AI助手 20. dify — 可视化创建AI代理 21. langflow — 拖拽式AI管道 22. mem0 — AI代理记忆层 23. browser-use — AI控制浏览器 24. ruflo — Claude代理编排 25. crewai — 多代理AI团队 26. hermes-agent — 开源AI代理 27. markitdown — 文件转Markdown 28. maigret — 3000+网站OSINT 29. open-webui — 自建ChatGPT界面 30. aider — 终端AI编程助手 31. agency-agents — 完整AI代理机构 32. tradingagents — 交易多代理框架 33. browserbase-skills — Claude网页SDK 34. autogen — 微软多代理框架 35. metagpt — AI代理软件公司 36. lobe-hub — 可视化多代理平台 37. huggingface-transformers — 现代AI基础 38. cocoindex — 长文本代理引擎 39. freeCodeCamp — 免费编程学习 40. stable-diffusion-webui — 本地AI画图大多数开发者一个都没保存。聪明人保存了全部40个。

显示更多

0

15

446

115

转发到社区

John Carmack@ID_AA_Carmack

2026.01.24 02:11

#PaperADay# 10 LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics The comments on #PaperADay# 3 recommended this paper as the state of the art JEPA paper, and it does look much better! They acknowledge that much of the prior JEPA research is ad-hoc and full of heuristics, but here they make strong theoretical claims of optimality and provide proofs (which I did not read). The first claim is that isotropic gaussian is the unique optimal embedding distribution for both linear and nonlinear probing, minimizing worst-case risk across downstream tasks. I would have taken that on faith with just a “sounds good to me”, but they go into it with details and examples. Actually getting an isotropic gaussian in high dimensions is easier said than done. They present Sketched Isotropic Gaussian Regularization (SIGReg) as a well behaved loss function to achieve this after analyzing a number of different statistical tests, and they claim it beats the curse of dimensionality with linear scalability. The final loss is just a blend factor to weight the JEPA prediction loss against the SIGReg isotropy loss. This is the one tunable hyperparameter for LeJEPA. Despite the P in JEPA, they don’t use predictor networks here, they just directly compare view embeddings for the JEPA loss. Predictor networks could still be useful for video sequences, especially when conditioned with action information for agents / robots. Each training image is augmented to produce 2 global views and 6 local views with different spatial scales but the same set of color and geometric transformations. The loss is the average MSE between the average of the global view embeddings and each of the local view embeddings. I don’t have a good feel for the tradeoffs in their view transforms, which still seem very much in the ad-hoc space, but they will determine the nature of what gets filtered out of the representation. Learning what doesn’t matter is critical, but the specification of “matters” is only implicit in the view transformations. LeJEPA itself is architecture independent – anything that digests a batch of samples from a dataset into vectors can be used. Vision transformers, MLP, ConvNets, etc. The specific augmentations for views would be input modality specific, but the LeJEPA algorithm could work on audio, images, video, or other things. They show that the LeJEPA loss on a large foundation model is very indicative of downstream task performance, both directly, and with a heuristic to improve the predictive power of the loss farther. They also show that it can be used to train from scratch on small datasets with as few as 1000 samples and achieve better results than probing a conventional general foundation model. I was pleased to see sample code blocks in the paper instead of greek-laden pseudocode, as well as a github repo. Appendix D has interesting details on generating good coverage of unit hyperspheres with low discrepancy samples by transforming Sobol sequences, but this is only for their theoretical analysis, and they show you are better off just making new random hypervectors every batch, with even 16 random vectors outperforming a fixed set of thousands. Some questions: In the discussion of non-linear probing, only kNN and kernel methods are mentioned, presumably for their theoretical analysis tractability, but would an MLP generally perform better? A JEPA embedding is not fully reversible like NICE or a RevNet, so how does it react to inputs that are far outside the training set? Will novel inputs map to unique embeddings, or could they be collapsed onto the codes from the training set? How would the embeddings evolve in a continuous learning environment, as novel inputs are added to the training mix? Can a JEPA be overtrained – is lower training loss always better, or would there be an optimal early stopping point?

显示更多

0

23

311

27

转发到社区

François Chollet@fchollet

2024.03.21 23:46

Keras 3.1 introduced in-place int8 quantization for Dense/EinsumDense (and thus all Transformers). It delivers ~20% speedups over float16 on JAX & TF with recent GPUs.

0

3

122

12

转发到社区

TikTok US@tiktok_us

2023.05.08 16:12

the leader of the Autobots is now on TikTok. unleash your inner Optimus Prime with our brand new #TextToSpeech# Voice. #Transformers# @transformers

0

46

424

73

转发到社区

François Chollet@fchollet

2023.02.03 17:36

New tutorial on semantic segmentation with the Segformer model. The impact of Transformers in CV is nascent but growing fast...

0

3

115

11

转发到社区

Andrej Karpathy@karpathy

2022.11.18 01:37

The non-obvious crux of the shift is an empirical finding, emergent only at scale, and well-articulated in the GPT-3 paper ( Basically, Transformers demonstrate the ability of "in-context" learning. At run-time, in the activations. No weight updates.

显示更多

0

5

221

19

转发到社区

与「Transformers」相关的搜索结果