Spatially and Temporally Optimized Audio-Driven Talking Face Generation

Dong, Biao; Ma, Bo-Yao; Zhang, Lei

Spatially and Temporally Optimized Audio-Driven Talking Face Generation

dc.contributor.author	Dong, Biao	en_US
dc.contributor.author	Ma, Bo-Yao	en_US
dc.contributor.author	Zhang, Lei	en_US
dc.contributor.editor	Chen, Renjie	en_US
dc.contributor.editor	Ritschel, Tobias	en_US
dc.contributor.editor	Whiting, Emily	en_US
dc.date.accessioned	2024-10-13T18:08:26Z
dc.date.available	2024-10-13T18:08:26Z
dc.date.issued	2024
dc.description.abstract	Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.	en_US
dc.description.number	7
dc.description.sectionheaders	Human and Character Animation
dc.description.seriesinformation	Computer Graphics Forum
dc.description.volume	43
dc.identifier.doi	10.1111/cgf.15228
dc.identifier.issn	1467-8659
dc.identifier.pages	11 pages
dc.identifier.uri	https://doi.org/10.1111/cgf.15228
dc.identifier.uri	https://diglib.eg.org/handle/10.1111/cgf15228
dc.publisher	The Eurographics Association and John Wiley & Sons Ltd.	en_US
dc.subject	CCS Concepts: Animation → Facial Animation; Imaging & Video → Image/Video Editing
dc.subject	Animation → Facial Animation
dc.subject	Imaging & Video → Image/Video Editing
dc.title	Spatially and Temporally Optimized Audio-Driven Talking Face Generation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: cgf15228.pdf
Size:: 8.96 MB
Format:: Adobe Portable Document Format

Download

Collections

43-Issue 7