PG2025 Conference Papers, Posters, and Demos

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3607236

Browse

Now showing 1 - 1 of 1

PF-UCDR: A Local-Aware RGB-Phase Fusion Network with Adaptive Prompts for Universal Cross-Domain Retrieval
(The Eurographics Association, 2025) Wu, Yiqi; Hu, Ronglei; Wu, Huachao; He, Fazhi; Zhang, Dejun; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, Eugene
Universal Cross-Domain Retrieval (UCDR) aims to match semantically related images across domains and categories not seen during training. While vision-language pre-trained models offer strong global alignment, we are inspired by the observation that local structures, such as shapes, contours, and textures, often remain stable across domains, and thus propose to model them explicitly at the patch level. We present PF-UCDR, a framework built upon frozen vision-language backbones that performs patch-wise fusion of RGB and phase representations. Central to our design is a Fusing Vision Encoder, which applies masked cross-attention to spatially aligned RGB and phase patches, enabling fine-grained integration of complementary appearance and structural cues. Additionally, we incorporate adaptive visual prompts that condition image encoding based on domain and class context. Local and global fusion modules aggregate these enriched features, and a two-stage training strategy progressively optimizes alignment and retrieval objectives. Experiments on standard UCDR benchmarks demonstrate that PF-UCDR significantly outperforms existing methods, validating the effectiveness of structure-aware local fusion grounded in multimodal pretraining. Our code is publicly available at https://github.com/djzgroup/PF-UCDR.

Browse

Browsing PG2025 Conference Papers, Posters, and Demos by Subject "based indexing and retrieval"

Results Per Page

Sort Options