Robot learning is moving beyond policies built for one robot, one scene, one task.
At MIT, we’re exploring a different path: turning video world models into embodiment-agnostic robot policies.
Introducing VERA: a 14B video-to-action system that controls robots across embodiments, skills, and environments.
From zero-shot pick-and-place on a real Panda arm to contact-rich cube reorientation with a 16-DoF robotic hand.
Different robots. Different environments. Different tasks.
Same video planner. Same weights.
We’re open-sourcing everything so you can fine-tune VERA for your own robot setup too. Deep dive in the thread:
🔗
🧵 (1/7)
显示更多