paperApril 2025

CoMotion: Concurrent Multi-Person 3D Motion

AuthorsAlejandro Newell, Peiyun Hu, Lahav Lipson, Stephan R. Richter, Vladlen Koltun

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time.

Related readings and updates.

May 30, 2025research area Computer Vision, research area Methods and Algorithmsconference CVPR

As diffusion models dominating visual content generation, efforts have been made to adapt these models for multi-view image generation to create 3D content. Traditionally, these methods implicitly learn 3D consistency by generating only RGB frames, which can lead to artifacts and inefficiencies in training. In contrast, we propose generating Normalized Coordinate Space (NCS) frames alongside RGB frames. NCS frames capture each pixel's global…

October 23, 2023research area Computer Visionconference ICCV

Dense 3D reconstruction from RGB images traditionally assumes static camera pose estimates. This assumption has endured, even as recent works have increasingly focused on real-time methods for mobile devices. However, the assumption of one pose per image does not hold for online execution: poses from real-time SLAM are dynamic and may be updated following events such as bundle adjustment and loop closure. This has been addressed in the RGB-D…

CoMotion: Concurrent Multi-Person 3D Motion

Related readings and updates.

World-Consistent Video Diffusion With Explicit 3D Modeling

LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses

Discover opportunities in Machine Learning.