"Wild West" of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Study

dc.contributor.authorHaque, Kazi Injamamulen_US
dc.contributor.authorPavlou, Alkiviadisen_US
dc.contributor.authorYumak, Zerrinen_US
dc.contributor.editorBousseau, Adrienen_US
dc.contributor.editorDay, Angelaen_US
dc.date.accessioned2025-05-09T09:16:06Z
dc.date.available2025-05-09T09:16:06Z
dc.date.issued2025
dc.description.abstractRecent advancements in the field of audio-driven 3D facial animation have accelerated rapidly, with numerous papers being published in a short span of time. This surge in research has garnered significant attention from both academia and industry with its potential applications on digital humans. Various approaches, both deterministic and non-deterministic, have been explored based on foundational advancements in deep learning algorithms. However, there remains no consensus among researchers on standardized methods for evaluating these techniques. Additionally, rather than converging on a common set of datasets and objective metrics suited for specific methods, recent works exhibit considerable variation in experimental setups. This inconsistency complicates the research landscape, making it difficult to establish a streamlined evaluation process and rendering many cross-paper comparisons challenging. Moreover, the common practice of A/B testing in perceptual studies focus only on two common metrics and not sufficient for non-deterministic and emotion-enabled approaches. The lack of correlations between subjective and objective metrics points out that there is a need for critical analysis in this space. In this study, we address these issues by benchmarking state-of-the-art deterministic and non-deterministic models, utilizing a consistent experimental setup across a carefully curated set of objective metrics and datasets. We also conduct a perceptual user study to assess whether subjective perceptual metrics align with the objective metrics. Our findings indicate that model rankings do not necessarily generalize across datasets, and subjective metric ratings are not always consistent with their corresponding objective metrics. The supplementary video, edited code scripts for training on different datasets and documentation related to this benchmark study are made publicly available- https://galib360.github.io/face-benchmark-project/.en_US
dc.description.number2
dc.description.sectionheadersFace-First for Digital Avatars
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume44
dc.identifier.doi10.1111/cgf.70073
dc.identifier.issn1467-8659
dc.identifier.pages13 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.70073
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf70073
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution-NonCommercial 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectCCS Concepts: Computing methodologies → Neural networks; Animation; Human-centered computing → User studies
dc.subjectComputing methodologies → Neural networks
dc.subjectAnimation
dc.subjectHuman
dc.subjectcentered computing → User studies
dc.title"Wild West" of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Studyen_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
cgf70073.pdf
Size:
615.04 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
paper1227_1.mp4
Size:
90.18 MB
Format:
Video MP4