Spatially and Temporally Optimized Audio-Driven Talking Face Generation

dc.contributor.authorDong, Biaoen_US
dc.contributor.authorMa, Bo-Yaoen_US
dc.contributor.authorZhang, Leien_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:08:26Z
dc.date.available2024-10-13T18:08:26Z
dc.date.issued2024
dc.description.abstractAudio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.en_US
dc.description.number7
dc.description.sectionheadersHuman and Character Animation
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume43
dc.identifier.doi10.1111/cgf.15228
dc.identifier.issn1467-8659
dc.identifier.pages11 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.15228
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf15228
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Animation → Facial Animation; Imaging & Video → Image/Video Editing
dc.subjectAnimation → Facial Animation
dc.subjectImaging & Video → Image/Video Editing
dc.titleSpatially and Temporally Optimized Audio-Driven Talking Face Generationen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
cgf15228.pdf
Size:
8.96 MB
Format:
Adobe Portable Document Format
Collections