Expressive Speech Animation Synthesis with Phoneme-Level Controls
No Thumbnail Available
Date
2008
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association and Blackwell Publishing Ltd
Abstract
This paper presents a novel data-driven expressive speech animation synthesis system with phoneme-level controls. This system is based on a pre-recorded facial motion capture database, where an actress was directed to recite a pre-designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme-aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best-matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify hard constraints (motion-node constraints for expressing phoneme utterances) and soft constraints (emotion modifiers) to guide this search process. We also introduce a phoneme-Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.
Description
@article{10.1111:j.1467-8659.2008.01192.x,
journal = {Computer Graphics Forum},
title = {{Expressive Speech Animation Synthesis with Phoneme-Level Controls}},
author = {Deng, Z. and Neumann, U.},
year = {2008},
publisher = {The Eurographics Association and Blackwell Publishing Ltd},
ISSN = {1467-8659},
DOI = {10.1111/j.1467-8659.2008.01192.x}
}