Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation

Abstract
Face registration deforms a template mesh to closely fit a 3D face scan, the quality of which commonly degrades in non-skin regions (e.g., hair, beard, accessories), because the optimized template-to-scan distance pulls the template mesh towards the noisy scan surface. Improving registration quality requires a clean separation of skin and non-skin regions on the scan mesh. Existing image-based (2D) or scan-based (3D) segmentation methods however perform poorly. Image-based segmentation outputs multi-view inconsistent masks, and they cannot account for scan inaccuracies or scan-image misalignment, while scan-based methods suffer from lower spatial resolution compared to images. In this work, we introduce a novel method that accurately separates skin from non-skin geometry on 3D human head scans. For this, our method extracts features from multi-view images using a frozen image foundation model and aggregates these features in 3D. These lifted 2D features are then fused with 3D geometric features extracted from the scan mesh, to then predict a segmentation mask directly on the scan mesh. We show that our segmentations improve the registration accuracy over pure 2D or 3D segmentation methods by 8.89% and 14.3%, respectively. Although trained only on synthetic data, our model generalizes well to real data.
Description

        
@inproceedings{
10.2312:egs.20251037
, booktitle = {
Eurographics 2025 - Short Papers
}, editor = {
Ceylan, Duygu
and
Li, Tzu-Mao
}, title = {{
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation
}}, author = {
Chen, Victoria Yue
and
Wang, Daoye
and
Garbin, Stephan
and
Bednarik, Jan
and
Winberg, Sebastian
and
Bolkart, Timo
and
Beeler, Thabo
}, year = {
2025
}, publisher = {
The Eurographics Association
}, ISSN = {
1017-4656
}, ISBN = {
978-3-03868-268-4
}, DOI = {
10.2312/egs.20251037
} }
Citation