Scene-Aware 3D Multi-Human Motion Capture from a Single Camera

dc.contributor.authorLuvizon, Diogo C.en_US
dc.contributor.authorHabermann, Marcen_US
dc.contributor.authorGolyanik, Vladislaven_US
dc.contributor.authorKortylewski, Adamen_US
dc.contributor.authorTheobalt, Christianen_US
dc.contributor.editorMyszkowski, Karolen_US
dc.contributor.editorNiessner, Matthiasen_US
dc.date.accessioned2023-05-03T06:10:51Z
dc.date.available2023-05-03T06:10:51Z
dc.date.issued2023
dc.description.abstractIn this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-thewild conditions including challenging scenes with people of different sizes. Code: https://github.com/dluvizon/ scene-aware-3d-multi-humanen_US
dc.description.number2
dc.description.sectionheadersCapturing Human Pose and Appearance
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume42
dc.identifier.doi10.1111/cgf.14768
dc.identifier.issn1467-8659
dc.identifier.pages371-383
dc.identifier.pages13 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.14768
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf14768
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectCCS Concepts: Computing methodologies -> Motion capture; Scene understanding
dc.subjectComputing methodologies
dc.subjectMotion capture
dc.subjectScene understanding
dc.titleScene-Aware 3D Multi-Human Motion Capture from a Single Cameraen_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
v42i2pp371-383_cgf14768.pdf
Size:
5.55 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
paper1066_mm.mp4
Size:
96.73 MB
Format:
Unknown data format
Collections