Need to find a harder problem for /goal. I haven't even finished my coffee yet.
Introducing /goal in Grok Build.
Execute long-running tasks autonomously, with multiple rounds of subagents implementing and verifying a single goal.
Getting used to being liked likely means you are overfit to RLHF.
The problem with overfitting is that the pain overwhelms the limbic system once you try to sample trajectories outside the known distribution.
As more people like you, your sampling regime becomes smaller and smaller to avoid negative feedback. Eventually you get stuck and become a slave to your own feelings.
That’s why I have never seen a model student happy once they become a “model”. Their weights are frozen and cannot be updated anymore. They cannot risk being better than their own SOTA.
显示更多
Human-to-human interaction is often bandwidth-bound instead of compute-bound.
Thus, the next evolutionary jump would be direct communication in latent space, skipping the long-latency encoder-decoder loop.
显示更多
If your sole value is identity, then someone else will use a more extreme identity to displace you.
The best signal-to-noise ratio is the product you use in your hands, not critics, marketing, or reviews.
The sharpest eyes with the smartest brain.
👁️ 🤝 🧠
In IIHS pedestrian front crash prevention tests,
@Cybertruck avoided every single collision – daytime, nighttime & different angles
It was also the only pickup to earn Top Safety Pick+ (highest award) in 2026
显示更多
How’s your Father’s Day going? Mine is fixing a burst pipe. Very fitting.
If you are visiting the US for the World Cup ⚽️, please make sure to rent a Tesla and experience Full Self-Driving.
Many people think any given ML project is 99% training.
In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training.
The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data.
Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.
显示更多
At 7 a.m., Grok Build would wake me up and tell me what they had done last night—experiments, bug fixes, and the plans for today.
Rinse and repeat.
Figuring out the right eval is 100x harder than gradient descent itself.
There are many good X articles lately, some superb, intellectual, and honest.
They are 1000x better than opinion columns in newspapers. I noticed my chat group has more X article links than links from MSM.
In the era of agentic AI, speed in reacting to truthful information is everything, or gradient descent can quickly go in another direction.
A rare sign of enlightenment.
显示更多
Casually using Grok
@imagine to one-shot sword fight scene in the bamboo forest (5 mins). Pretty good for the first try.
Grok Build is pretty good at optimizing my code in one shot.
Prompt:
I want you to optimize it entirely on GPU to speed it up. Measure two metrics: the result must compare with the golden image (CPU) and be nearly identical (PSNR > 40dB), with fast pixels per second.
Make a plan to a) write GPU equivalent code, b) write a benchmark suite to measure PSNR and pixels per second, c) execute various optimization strategies.
Go!
显示更多
Welcome to the future 🇩🇰🤗
FSD Supervised now approved in Denmark 🇩🇰
Rollout will begin soon
As eval is downstream of everything, it determines whether you will spend your time optimizing the right metrics.
The current gap between academia and industry AI labs is the attitude toward eval.
In academia, the eval set is very hard to change since a) you need to explain why your eval is better and b) you need to benchmark against your cited works with the new eval and show that your work is superior.
Doing both a and b at the same time invites risky rebuttal, even if you are doing a good job on a. It is far easier to benchmark against the eval set that everyone has agreed upon.
In contrast, in industry AI labs, customer feedback is your eval set and it keeps changing to cover the long tail that you could never think of during years of PhD programs.
If the loss functions are not a good proxy for customer feedback, then you change them until both are aligned.
Thus, academia might train students who are very good at hill climbing but inexperienced in building eval sets that capture hard real cases. To move the needle, building the right eval set matters the most.
显示更多
Among many AI products, Full Self-Driving probably delivers the most consistent results—no special harness, skills, plugins, or secret prompts required.
Have you spent time inspecting reasoning traces of your kids? 🙂
Grok Build sub-agent swarm weekend fun. You can reuse the prompt for your projects:
Read the proof of ` and come up with a few different examples with more points:
a) Please understand the proof
b) Come up with plan
c) Orchestrate and launch sub-agent to execute the plan step-by-step
d) Validate the results from the sub-agent, and correct them
e) Repeat b, c, d, until you are happy with the result and its correctness
DO NOT stop until the goal is reach
显示更多
May this era be the new dawn of humanity.