Real-time 3D Human Body Pose Estimation from Monocular RGB Input

Mehta, Dushyant

Real-time 3D Human Body Pose Estimation from Monocular RGB Input

dc.contributor.author	Mehta, Dushyant
dc.contributor.author		en_US
dc.date.accessioned	2021-01-20T08:34:42Z
dc.date.available	2021-01-20T08:34:42Z
dc.date.issued	2020-10
dc.description.abstract	Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture solutions require cumbersome external and/or on-body instrumentation, or use active sensors with limits on the possible capture volume dictated by power consumption. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes. New training datasets are proposed, for single-person and multi-person scenarios, which, together with the proposed transfer learning based training pipeline, allow learning based approaches to be appearance invariant. The training datasets are accompanied by evaluation benchmarks with multiple avenues of fine-grained evaluation. The evaluation benchmarks differ visually from the training datasets, so as to promote efforts towards solutions that generalize to in-the-wild scenes. The proposed task formulations for the single-person and multi-person case allow higher accuracy, and incorporate additional qualities such as occlusion robustness, that are helpful in the context of a full motion capture solution. The multi-person formulations are designed to have a nearly constant inference time regardless of the number of subjects in the scene, and combined with contributions towards fast neural network inference, enable real-time 3D pose estimation for multiple subjects. Combining the proposed learning-based approaches with a model-based kinematic skeleton fitting step provides temporally stable joint angle estimates, which can be readily employed for driving virtual characters.	en_US
dc.description.seriesinformation	EG Graphics Dissertation Online
dc.description.sponsorship	The work that the thesis is comprised of was supported by ERC Starting GrantCapReal (335545) and ERC Consolidator Grant 4DRepLy (770784)	en_US
dc.identifier.doi	10.2312/diss.20202632998
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/2632998
dc.identifier.uri
dc.language.iso	en	en_US
dc.publisher	Saarländische Universitäts-und Landesbibliothek	en_US
dc.subject	motion capture	en_US
dc.subject	human pose	en_US
dc.subject	hci	en_US
dc.subject	computer vision	en_US
dc.subject	animation	en_US
dc.subject	machine learning	en_US
dc.title	Real-time 3D Human Body Pose Estimation from Monocular RGB Input	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis.pdf
Size:: 57.81 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.79 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

2020