Lester Li (@sizhe_lester_li)

2026.06.23 16:34

Robot learning is moving beyond policies built for one robot, one scene, one task. At MIT, we’re exploring a different path: turning video world models into embodiment-agnostic robot policies. Introducing VERA: a 14B video-to-action system that controls robots across embodiments, skills, and environments. From zero-shot pick-and-place on a real Panda arm to contact-rich cube reorientation with a 16-DoF robotic hand. Different robots. Different environments. Different tasks. Same video planner. Same weights. We’re open-sourcing everything so you can fine-tune VERA for your own robot setup too. Deep dive in the thread: 🔗 🧵 (1/7)

显示更多