Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection

Chandran, P.; Zoss, G.; Gotardo, P.; Bradley, D.

Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection

dc.contributor.author	Chandran, P.	en_US
dc.contributor.author	Zoss, G.	en_US
dc.contributor.author	Gotardo, P.	en_US
dc.contributor.author	Bradley, D.	en_US
dc.contributor.editor	Alliez, Pierre	en_US
dc.contributor.editor	Wimmer, Michael	en_US
dc.date.accessioned	2024-12-19T11:15:43Z
dc.date.available	2024-12-19T11:15:43Z
dc.date.issued	2024
dc.description.abstract	In this paper, we examine three important issues in the practical use of state‐of‐the‐art facial landmark detectors and show how a combination of specific architectural modifications can directly improve their accuracy and temporal stability. First, many facial landmark detectors require a face normalization step as a pre‐process, often accomplished by a separately trained neural network that crops and resizes the face in the input image. There is no guarantee that this pre‐trained network performs optimal face normalization for the task of landmark detection. Thus, we instead analyse the use of a spatial transformer network that is trained alongside the landmark detector in an unsupervised manner, jointly learning an optimal face normalization and landmark detection by a single neural network. Second, we show that modifying the output head of the landmark predictor to infer landmarks in a canonical 3D space rather than directly in 2D can further improve accuracy. To convert the predicted 3D landmarks into screen‐space, we additionally predict the camera intrinsics and head pose from the input image. As a side benefit, this allows to predict the 3D face shape from a given image only using 2D landmarks as supervision, which is useful in determining landmark visibility among other things. Third, when training a landmark detector on multiple datasets at the same time, annotation inconsistencies across datasets forces the network to produce a sub‐optimal average. We propose to add a semantic correction network to address this issue. This additional lightweight neural network is trained alongside the landmark detector, without requiring any additional supervision. While the insights of this paper can be applied to most common landmark detectors, we specifically target a recently proposed continuous 2D landmark detector to demonstrate how each of our additions leads to meaningful improvements over the state‐of‐the‐art on standard benchmarks.	en_US
dc.description.number	6
dc.description.sectionheaders	Major Revision from Eurographics Conference
dc.description.seriesinformation	Computer Graphics Forum
dc.description.volume	43
dc.identifier.doi	10.1111/cgf.15126
dc.identifier.pages	13 pages
dc.identifier.uri	https://doi.org/10.1111/cgf.15126
dc.identifier.uri	https://diglib.eg.org/handle/10.1111/cgf15126
dc.publisher	© 2024 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd.	en_US
dc.subject	animation
dc.subject	facial animation
dc.subject	image and video processing
dc.title	Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 17_cgf15126.pdf
Size:: 18.09 MB
Format:: Adobe Portable Document Format

Download

Collections

43-Issue 6