Learning Dynamic 3D Geometry and Texture for Video Face Swapping

dc.contributor.authorOtto, Christopheren_US
dc.contributor.authorNaruniec, Jaceken_US
dc.contributor.authorHelminger, Leonharden_US
dc.contributor.authorEtterlin, Thomasen_US
dc.contributor.authorMignone, Grazianaen_US
dc.contributor.authorChandran, Prashanthen_US
dc.contributor.authorZoss, Gasparden_US
dc.contributor.authorSchroers, Christopheren_US
dc.contributor.authorGross, Markusen_US
dc.contributor.authorGotardo, Pauloen_US
dc.contributor.authorBradley, Dereken_US
dc.contributor.authorWeber, Romannen_US
dc.contributor.editorUmetani, Nobuyukien_US
dc.contributor.editorWojtan, Chrisen_US
dc.contributor.editorVouga, Etienneen_US
dc.date.accessioned2022-10-04T06:42:04Z
dc.date.available2022-10-04T06:42:04Z
dc.date.issued2022
dc.description.abstractFace swapping is the process of applying a source actor's appearance to a target actor's performance in a video. This is a challenging visual effect that has seen increasing demand in film and television production. Recent work has shown that datadriven methods based on deep learning can produce compelling effects at production quality in a fraction of the time required for a traditional 3D pipeline. However, the dominant approach operates only on 2D imagery without reference to the underlying facial geometry or texture, resulting in poor generalization under novel viewpoints and little artistic control. Methods that do incorporate geometry rely on pre-learned facial priors that do not adapt well to particular geometric features of the source and target faces. We approach the problem of face swapping from the perspective of learning simultaneous convolutional facial autoencoders for the source and target identities, using a shared encoder network with identity-specific decoders. The key novelty in our approach is that each decoder first lifts the latent code into a 3D representation, comprising a dynamic face texture and a deformable 3D face shape, before projecting this 3D face back onto the input image using a differentiable renderer. The coupled autoencoders are trained only on videos of the source and target identities, without requiring 3D supervision. By leveraging the learned 3D geometry and texture, our method achieves face swapping with higher quality than when using offthe- shelf monocular 3D face reconstruction, and overall lower FID score than state-of-the-art 2D methods. Furthermore, our 3D representation allows for efficient artistic control over the result, which can be hard to achieve with existing 2D approaches.en_US
dc.description.number7
dc.description.sectionheadersDigital Human
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume41
dc.identifier.doi10.1111/cgf.14705
dc.identifier.issn1467-8659
dc.identifier.pages611-622
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.14705
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf14705
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.subjectCCS Concepts: Computing methodologies → Image manipulation; Rendering; Neural Networks
dc.subjectComputing methodologies → Image manipulation
dc.subjectRendering
dc.subjectNeural Networks
dc.titleLearning Dynamic 3D Geometry and Texture for Video Face Swappingen_US
Files
Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
v41i7pp611-622.pdf
Size:
22.94 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
paper1151_mm.mp4
Size:
177.86 MB
Format:
Unknown data format
Loading...
Thumbnail Image
Name:
paper1151_supplemental_material.pdf
Size:
47.15 MB
Format:
Adobe Portable Document Format
Collections