"Wild West" of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Study

Loading...
Thumbnail Image
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association and John Wiley & Sons Ltd.
Abstract
Recent advancements in the field of audio-driven 3D facial animation have accelerated rapidly, with numerous papers being published in a short span of time. This surge in research has garnered significant attention from both academia and industry with its potential applications on digital humans. Various approaches, both deterministic and non-deterministic, have been explored based on foundational advancements in deep learning algorithms. However, there remains no consensus among researchers on standardized methods for evaluating these techniques. Additionally, rather than converging on a common set of datasets and objective metrics suited for specific methods, recent works exhibit considerable variation in experimental setups. This inconsistency complicates the research landscape, making it difficult to establish a streamlined evaluation process and rendering many cross-paper comparisons challenging. Moreover, the common practice of A/B testing in perceptual studies focus only on two common metrics and not sufficient for non-deterministic and emotion-enabled approaches. The lack of correlations between subjective and objective metrics points out that there is a need for critical analysis in this space. In this study, we address these issues by benchmarking state-of-the-art deterministic and non-deterministic models, utilizing a consistent experimental setup across a carefully curated set of objective metrics and datasets. We also conduct a perceptual user study to assess whether subjective perceptual metrics align with the objective metrics. Our findings indicate that model rankings do not necessarily generalize across datasets, and subjective metric ratings are not always consistent with their corresponding objective metrics. The supplementary video, edited code scripts for training on different datasets and documentation related to this benchmark study are made publicly available- https://galib360.github.io/face-benchmark-project/.
Description

CCS Concepts: Computing methodologies → Neural networks; Animation; Human-centered computing → User studies

        
@article{
10.1111:cgf.70073
, journal = {Computer Graphics Forum}, title = {{
"Wild West" of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Study
}}, author = {
Haque, Kazi Injamamul
and
Pavlou, Alkiviadis
and
Yumak, Zerrin
}, year = {
2025
}, publisher = {
The Eurographics Association and John Wiley & Sons Ltd.
}, ISSN = {
1467-8659
}, DOI = {
10.1111/cgf.70073
} }
Citation