Why did xAI hand over a 220,000-GPU cluster to Anthropic?
The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster — a "heterogeneous architecture."
For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s — or for any GPU that has hit a stack-related snag — to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google.
The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000–10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage.
Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs — literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine.
Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand — whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly — was leased in its entirety to an Anthropic that desperately needed inference capacity.
Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads.
Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 — built entirely on Blackwell — is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack — the MFU-11% debacle — to Anthropic, while keeping his own focus on training the next generation of models.
The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI — long the "sore finger" — is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields.
From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash."
As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect — the chief weakness of a mixed cluster — is essentially neutralized for inference workloads.
Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly.
One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5–6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure.
The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5–6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even.
Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change.
(May 8, 2026, Mirae Asset Securities)
显示更多
A life devoid of productivity and genuine meaning ought to be empty and insignificant for the modern individual.
: The author asserts that for the modern individual, a life lacking both productivity and genuine meaning should feel profoundly empty and insignificant. This is not a moral judgment but a statement of psychological and existential reality. In an era where self-realization and contribution are central to personal identity, mere survival or passive consumption no longer suffices.
Productivity here means the active creation of value, whether through work, relationships, art, or service. Genuine meaning arises when that productivity is aligned with deeper purpose. When both are absent, life loses its weight and luster. The individual senses a void that no external distraction can fill.
This serves as a quiet challenge. It reminds us that modern freedom and opportunity carry a corresponding responsibility: to live deliberately, to create, and to invest our time in what truly matters. A life without productivity and meaning is not merely unfortunate; it is, in the author’s view, incompatible with the dignity and aspirations of the contemporary human being.
显示更多
First video of LK-99 Full Levitation, aka flux-pinning
This video was just posted to the Chinese video-sharing site BiliBili and claims to be a highly pure synthesized sample of LK-99.
What is the physical phenomenon behind this and what does it mean?
Levitation of superconducting materials is a phenomenon unique to what is called Type-II superconductors, and is an effect whereby magnetic field lines becomes 'trapped' as it passes through the material, providing the force needed to levitate. These are the popular images and videos of cryogenically-cooled discs floating above a magnet frequently seen online and in the pinned post on my profile.
You can think of this like strands of hair being caught in gum - the gum is suspended in mid-air by adhering strongly to the hair as the hair passes through it. The hair in this case is magnetic field lines and the gum is the Type-II superconductor. Just like hair comes in individual strands, or in other words hair is 'quantized' or 'discrete', so is the flux trapped at the 'pinning centers' quantized in what are called 'magnetic vortices' - the quantization of pinned flux lines is a key property and distinguishing characteristic of Type-II superconductors (although technically can occur in Type-I superconductors if the material thickness is smaller than the London penetration depth, which is indeed very small - specifics for the physics nerds out there).
Flux-pinning is entirely unique to superconductors and is also wholly distinct from the Meissner effect. It is not a property of diamagnets or diamagnetism.
At
@TRIUMFLab I contributed to flux-pinning studies in Niobium crystal superconducting radio-frequency cavities used for particle acceleration. In that application, trapped flux poses an issue by increasing the remnant surface resistivity of the cavity, which has the effect of decreasing its effective quality factor or Q-factor, which is a measurement of a resonators efficiency. SRF cavities typically have Q-factors of 10E10 and trapped flux at pinning centers reduces the maximum effective accelerating electric field used to drive charged particle bunches close to the speed of light.
Flux pinning is thought to arise in some Type-II superconductors by small imperfections in the crystal, also called volume defects, that enable flux to penetrate the material. In SRF cavities an issue that arises is any magnetic field that is passing through the material, e.g. by the Earth's background field, can become pinned or trapped inside the cavity as it transitions into a superconducting state. See some attached plots in the comments from a study showing how the surface resistivity of SRF cavities increases the more there is a background field as the cavity transitions into superconducting state.
This is the first video I am aware of that claims to show the flux-pinned levitation of a LK-99 sample. If this is in fact what is happening, then it is a very unique and promising finding of this new materials properties and potential for future study.
If this is real then it is truly ground-breaking
显示更多