注册并分享邀请链接,可获得视频播放与邀请奖励。

John Carmack 的个人资料封面
John Carmack 的头像

John Carmack (@ID_AA_Carmack)

@ID_AA_Carmack
AGI at Keen Technologies, former CTO Oculus VR, Founder Id Software and Armadillo Aerospace
286 正在关注    2.5M 粉丝
There are a few things that I look back on as my mistakes in the early days. Quake was overly ambitious technically. We could have done all the great multiplayer and modding work inside a Doom++ engine, allowing the designers to work with a more stable base instead of rug-pulling everything out from underneath them a couple times. The follow up game could have then brought in full 6DOF environments and characters. I pushed everyone too hard. I didn’t appreciate how maturing companies need more slack, and that running people at startup intensity constantly will wear them out. Quake was also where I really had to accept my personal limits. I was working pretty much as hard as humanly possible, and I was still slipping past my goal points. On all of the founders’ shoulders, our original corporate stock arrangement and buy/sell agreement was a mistake, and resulted in bad incentives. We wanted to ensure that all ownership rested in the hands of people working hard on current projects, but the Silicon Valley standard approach of vesting stock would have worked out better. One real problem that I don’t accept the blame for is that we were insisting that level designers be not just game designers, but also have strong visual design esthetics. They needed to make things that not only played well, but looked awesome, and it got more challenging as the technology provided a richer palette. Romero covered that well, which set our company expectations early on. We should have figured out how to pair up artists and designers earlier, but there was infighting among the designers, and the ones that could manage the visuals were happy to disparage the ones that couldn’t. Sorry, Sandy.
显示更多
0
372
12.9K
1.1K
转发到社区
I’m reading an old collection of interconnected science fiction stories by Jerry Pournelle, written in the early 70s. His best books were later co-authored with Larry Niven, but it is still solid work in my favored “competence porn” genre, with entrepreneurs as protagonists. It stands out to me that he was despairing for America when he wrote the stories. Things looked bad at the time, and his fiction projected it into the future. Social unrest, Vietnam, Watergate, economic recession, energy crisis, and for a patriotic space guy, abandoning Apollo. The backdrop for the stories was that America was unfixable, which is, of course, a motivation to go to space in fiction, but I do think he was genuinely worried by what he saw around him. But over the next decade, things got better, and Jerry had a front row seat for the rise of the technology sector, writing the Chaos Manor column in Byte magazine for many years. He also got to see the founding of SpaceX, a company straight out of a hard SF novel, and they re-flew a landed rocket shortly before he died. Trends aren’t fate. Bad situations can be fixed, and good ones still need to be defended. RIP Jerry, I’m glad you got to see things turn around.
显示更多
0
70
688
65
转发到社区
If you are asking “Why push back against anti-datacenter efforts?” I consider it a tragedy that anti-nuclear efforts largely strangled nuclear power in the US based on vibes, and I don’t want to see that happen to AI. Public opinion matters, and it shouldn’t be ceded unchallenged. If you are asking “Why should I support AI efforts at all?” I believe we are in the midst of a transition more vibrant than the industrial revolution. Opinions formed a couple of years ago about the uselessness of AI are no longer valid. Millions of people and organizations are getting great returns from using it, and the demand for data centers is the market responding to the value signal. That is how progress is made!
显示更多
0
75
1.6K
184
转发到社区
I'm a little disappointed with myself that the high school algebra identity didn't occur to me right away.
0
70
1.6K
45
转发到社区
I've been coding for 40 years. Here are the top 5 things I wish I knew when I started. 1. 90% of the job is debugging and fixing, not creating new code. Which is still fun if you're good at it. I used to think programming was mostly writing fresh, clever stuff. In reality, most of your time is spent in other people's (or your own past self's) messy code, chasing down why something that "should" work doesn't. Get really good at debugging early. Learn assembly reading, call stacks, and kernel debuggers. It pays off hugely. The best engineers I saw were absolute magicians at this. 2. Manage complexity from day one (ie: don't write slop and "fix it later" if it goes somewhere). Very early on, I'd hammer out code and refactor afterward. Big mistake. Now I start with clean, skeletal structure (minimalism first) and flesh it out carefully, with AI or not. Messy code compounds and becomes unfixable. Upfront discipline on architecture, naming, and simplicity saves enormous pain later, especially in large systems like Windows. 3. Tools and processes matter more than you think We suffered with basic diff/manual deltas instead of modern source control like Git. Branching, testing, and good tooling would have made porting and collaboration way smoother. Invest in your environment, automation, and reproducible builds early. Good tools amplify your output; bad ones (or none) drag everything down. 4. Understand the problem and existing code deeply before writing Don't jump straight to coding. Map out the problem, study what's already there (you'll inherit a lot), and plan. Low-level knowledge (hardware quirks, alignment issues on different architectures like MIPS/Alpha) was crucial. Also: assert early and often. It forces clarity. 5. People, politics, and "the right tool for the job" beat pure tech arguments. Brilliant engineers still argue endlessly. Sometimes it's about ego, not merit. Learn to spot the difference and "steer" the conversation rather than "winning" it. Bonus from experience: Side projects like Task Manager (started at home because I wanted the tool) can become your biggest hits. Ship small, useful things often. If you're just starting, focus on fundamentals, patterns over syntax, and building resilience for the long haul. It's going to be a wild ride, but the fundamentals still matter.
显示更多
0
182
4K
518
转发到社区
My reply to someone considering starting a video game company: The distribution of possible rewards for starting a video game company are generally not very good today. The market is well served, and gaining a foothold requires strong execution on both business and product issues, along with a substantial amount of luck. Plan to burn through seven figures with a not-great chance of making it back. If you do go for it, some bits of advice: Identify your customers clearly before you start. Not just a broad community, but specific people, and imagine them as you make decisions. Initially, build the smallest, most concise game you can imagine anyone paying for. It will still take much longer than you expect. Once something exists, hill-climb the value. Hopefully you will have some elements that clearly bring joy to people, which you can magnify. There will inevitably be tons of things that people find confusing, frustrating, or just boring that you will need to fix.
显示更多
0
264
6.4K
512
转发到社区
Space launch was a clear case where there was a large difference in efficiency between what was possible and what was done in practice before SpaceX. A large part of that was due to everything being locked in to what (just barely) already worked, with huge risk aversion. WIth national prestige or a half billion dollar geosync satellite on the line, speculative engineering ideas that might result in a public debacle were not welcome. When failure is not an option, success can stay very expensive. You need to experiment to improve, and that fundamentally means being comfortable with failure. If you know it is going to work, it isn’t an experiment. I have long believed that nuclear power today is in precisely the same state as space launch two decades ago, but the even more pressing question now is if semiconductor fabrication might also be. On the one hand, Moore’s Law has been a sequence of heroic miracles of technology at the wafer fabrication level, grinding out hundreds of compounding small improvements. On the other hand, fabs are “too big to fail”, and there are elements of extreme conservatism at play. Intel’s “Copy exactly!” fab development exemplifies that mindset – instead of every new building being an opportunity to explore and optimize processes, it was deemed more valuable to just replicate. While each individual machine may be straining against physical limits of technology, it is possible that the systems orchestrating them all together could be far from optimal. The explore / exploit axis is fundamental to all decision making, but human risk avoidance probably biases away from optimal exploration.
显示更多
0
103
3.3K
293
转发到社区
New @BeatSaber music pack is out, and I must be one of the first to play, landing a top-10 score that will surely be out of the top 100 by tomorrow.
0
40
394
17
转发到社区
Some people are misreading this -- 511x511 was FASTER. It looks like at 512x512 and above it falls to another path that requires internal CudaMalloc/Free calls.
0
17
139
2
转发到社区
GPU library performance can be very notchy -- runtime of batched torch.linalg.solve_ex() went up by over 10x going from 511x511 matrices to 512x512.
0
43
606
17
转发到社区
I was on a cruise ship last week (Star of the Seas), and they had pods of 10 elevators in a circle, where you picked your destination floor on a pad, and it directed you to the correct elevator, which was often behind you. It seemed to work efficiently, but multiple times I saw people tap their floor and just look away, conditioned for normal elevator operation, and miss the arrival of the elevator they were supposed to get on. Addressing my normal pet peeve of interaction feedback latency would have helped — with all the fades and slides, it takes over a second for the first hint of the elevator to show up, and two seconds for it to fully stabilize. That may not seem like much in some circumstances, but it is plenty of time for people to look away. The elevator letter should appear instantaneously, maybe with some festive animation around it to hold attention that was on the button press. Even better would be to add a localized audio cue from the elevator the instant you pressed the button, which would let you immediately know where it is without having to scan for the lighted letter. (the Starlink internet on the ship was excellent, allowing me to get some work in at sea)
显示更多
0
124
1.3K
26
转发到社区
It is generally frowned upon to have LLMs precisely regurgitate part of their training set, but it is an interesting question how you could use LLM training to nearly losslesly compress a huge corpus like the entirety of the Internet Archive. The Hutter Prize is for perfect compression, but only one GB. There would be different trades at the PB level, and it gets much more interesting when it doesn’t have to be bit-accurate.
显示更多
0
108
1.5K
52
转发到社区
A Canticle For Leibowitz is a classic early (1959) post-apocalypse novel where an order of monks preserved the last remnants of learning (the memorabilia) after a nuclear exchange turned the remains of society into book and scientist burners. I first read it in the 80s as a mass market paperback that I somehow lost along the way. Other paperbacks from that time are yellow with age and getting brittle, but still readable. I read it again in the late 2000s on a first edition Kindle. I eventually migrated to iPads for Kindle reading, but every couple years I would come across an old Kindle in a drawer, charge it up, and check out what I had been reading on it. They eventually stopped working entirely. I’m just finishing reading a new Folio Society edition, printed on heavy, acid-free archival quality paper. If it doesn’t get soaked or burned, it could still be in good shape for centuries. The ephemeral nature of digital storage does give me some pause. We can still read Sumerian tablets full of administrative trivia from four thousand years ago, but there are no known copies of some important software products from just fifty years ago. I am a proud supporter of the Internet Archive!
显示更多
0
162
3.7K
434
转发到社区
FLOPS was originally “floating point operations per second”, specifying a rate of work for a system: A SPARCstation 2 gave 4.2 MFLOPS. Today you also see it used as “floating point operations” for an algorithm, or an amount of work: This layer takes 8 GFLOPS.
显示更多
0
48
725
21
转发到社区
Rhymes with @RichardSSutton’s Bitter Lesson.
A computer can do anything provided you learn to tell it how. Very recently, this has become vastly easier to do. Chalk up another victory for Carmack’s Law:
0
16
459
37
转发到社区
Making a scatter plot of 400_000 data points, some of the plots had odd gaps in coverage. It took me a little while to realize that it was only when the data was farther from the origin -- it was the raw bfloat16 precision. Everything looks great from -1 to 1, but as you go past 2 and 4, the coverage gaps get larger. My intuition didn't have it being quite so "discretely countable" at those modest numeric values. Float32 for comparison.
显示更多
0
69
1.9K
109
转发到社区
My library donation project for the LFS found homes for twenty sets of books, so I ordered another batch:
So many judging tasks could be improved by aggregating partial orderings, and in the limit, just ordering pairs. The annual Libertarian Futurist Society novel awards discussion is starting, and while I would like to participate on some level, there is no way I have time to read an entire slate of novels. However, I will likely read at least two from the list, and I could give a relative assessment. This cries out for the use of something like ELO ranking, as in chess competition, perhaps with some suggestions to get sufficient coverage. Peer and out-of-chain employee performance calibrations could probably also benefit from a greater quantity of sparse pairwise comparisons
显示更多
0
29
362
14
转发到社区
Without getting all the way down to performance counters, GPU power from nvidia-smi is a better indicator of true utilization than job scheduling or “gpu busy”. I would love to see animated “heat maps” of the big data centers, with each pixel being an individual GPU’s power draw. I am confident that inference and frontier training at the big labs is highly efficient, but I wonder how many GPUs would be dark due to scheduling and inefficient research code. With a little calibration for base load and peak, just the power bill for the datacenter would be a pretty good first order indicator of utilization.
显示更多
0
72
1.1K
62
转发到社区
#PaperADay# 10 LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics The comments on #PaperADay# 3 recommended this paper as the state of the art JEPA paper, and it does look much better! They acknowledge that much of the prior JEPA research is ad-hoc and full of heuristics, but here they make strong theoretical claims of optimality and provide proofs (which I did not read). The first claim is that isotropic gaussian is the unique optimal embedding distribution for both linear and nonlinear probing, minimizing worst-case risk across downstream tasks. I would have taken that on faith with just a “sounds good to me”, but they go into it with details and examples. Actually getting an isotropic gaussian in high dimensions is easier said than done. They present Sketched Isotropic Gaussian Regularization (SIGReg) as a well behaved loss function to achieve this after analyzing a number of different statistical tests, and they claim it beats the curse of dimensionality with linear scalability. The final loss is just a blend factor to weight the JEPA prediction loss against the SIGReg isotropy loss. This is the one tunable hyperparameter for LeJEPA. Despite the P in JEPA, they don’t use predictor networks here, they just directly compare view embeddings for the JEPA loss. Predictor networks could still be useful for video sequences, especially when conditioned with action information for agents / robots. Each training image is augmented to produce 2 global views and 6 local views with different spatial scales but the same set of color and geometric transformations. The loss is the average MSE between the average of the global view embeddings and each of the local view embeddings. I don’t have a good feel for the tradeoffs in their view transforms, which still seem very much in the ad-hoc space, but they will determine the nature of what gets filtered out of the representation. Learning what doesn’t matter is critical, but the specification of “matters” is only implicit in the view transformations. LeJEPA itself is architecture independent – anything that digests a batch of samples from a dataset into vectors can be used. Vision transformers, MLP, ConvNets, etc. The specific augmentations for views would be input modality specific, but the LeJEPA algorithm could work on audio, images, video, or other things. They show that the LeJEPA loss on a large foundation model is very indicative of downstream task performance, both directly, and with a heuristic to improve the predictive power of the loss farther. They also show that it can be used to train from scratch on small datasets with as few as 1000 samples and achieve better results than probing a conventional general foundation model. I was pleased to see sample code blocks in the paper instead of greek-laden pseudocode, as well as a github repo. Appendix D has interesting details on generating good coverage of unit hyperspheres with low discrepancy samples by transforming Sobol sequences, but this is only for their theoretical analysis, and they show you are better off just making new random hypervectors every batch, with even 16 random vectors outperforming a fixed set of thousands. Some questions: In the discussion of non-linear probing, only kNN and kernel methods are mentioned, presumably for their theoretical analysis tractability, but would an MLP generally perform better? A JEPA embedding is not fully reversible like NICE or a RevNet, so how does it react to inputs that are far outside the training set? Will novel inputs map to unique embeddings, or could they be collapsed onto the codes from the training set? How would the embeddings evolve in a continuous learning environment, as novel inputs are added to the training mix? Can a JEPA be overtrained – is lower training loss always better, or would there be an optimal early stopping point?
显示更多