注册并分享邀请链接,可获得视频播放与邀请奖励。

Saining Xie (@sainingxie) “📸latest in our cambrian series: cambrian-p, p for pose. i think pose is probabl” — TopicDigg

Saining Xie 的个人资料封面
Saining Xie 的头像
Saining Xie
@sainingxie
cofounder & chief science officer at @amilabs | faculty @nyu_courant | prev: @googledeepmind @meta (fair) @ucsandiego | ynwa
加入 July 2020
1.6K 正在关注    39.5K 粉丝
📸latest in our cambrian series: cambrian-p, p for pose. i think pose is probably the minimal sufficient 3d signal (and it’s easy to get!) that we need for robust video multimodal models -- jointly modeling frames and pose turns image sequences into a globally grounded structure.
显示更多
Camera pose matters for video understanding! Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose. Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)
显示更多