注册并分享邀请链接,可获得视频播放与邀请奖励。

Alvaro Cintas 的个人资料封面
Alvaro Cintas 的头像

Alvaro Cintas (@dr_cintas)

@dr_cintas
Educating about AI, Cybersecurity and Technology | Professor | PhD in Computer Science & Engineering
191 正在关注    130.5K 粉丝
NVIDIA just made AI detect objects 10x faster by deleting one step. It's called LocateAnything, and it removes the biggest bottleneck no one else was fixing in vision-language models. Normally a model builds each bounding box one coordinate token at a time. 100 objects means thousands of tokens before an answer. NVIDIA scrapped that: their Parallel Box Decoding predicts the whole box in a single forward pass, as one atomic unit. → 12.7 boxes/sec on one H100 → 10x faster than Qwen3-VL → +3.8% F1 on LVIS, accuracy up, not down → 3B params, runs on one consumer GPU Treating the box as one unit keeps its coordinates tied together, which is why accuracy climbed instead of falling. One model handles detection, GUI grounding, OCR, and document understanding, ready for computer-use agents, robotics, and document pipelines. 100% open source, weights, code, demo, and paper all live.
显示更多
0
48
1.6K
196
转发到社区