Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better
Loading...
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Sketch, as a representation of human thought, is abstract but also structured because it is presented as a two-dimensional image. Therefore, modeling it from semantic and structural perspectives is reasonable and effective. In this paper, for the semantic capturing, we compare the performance of two mainstream pre-trained models on the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) task and propose a new model, Semantic Net (SNET), based on Contrastive Language-Image Pre-training (CLIP) with a more effective fine-tuning strategy and a Semantic Preservation Module. Furthermore, we propose three lightweight modules, Channels Fusion (CF), Layers Fusion (LF), and Semantic Structure Fusion (SSF) to endow SNET with the ability of stronger structure capture. Finally, we supervise the entire training process by a classification loss based on contrastive learning and bidirectional triplet loss based on cosine distance metric. We call the final version model Semantic Structure Net (SSNET). The quantitative experimental results show that both our proposed SNET and the enhanced version SSNET achieve the new SOTA (16% retrieval boost on the most difficult QuickDraw Ext dataset). The visualization experiments also prove our thinking on sketch modeling from the side.
Description
CCS Concepts: Computing methodologies → Visual content-based indexing and retrieval
@inproceedings{10.2312:pg.20241309,
booktitle = {Pacific Graphics Conference Papers and Posters},
editor = {Chen, Renjie and Ritschel, Tobias and Whiting, Emily},
title = {{Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better}},
author = {Jing, Jiansen and Liu, Yujie and Li, Mingyue and Xiao, Qian and Chai, Shijie},
year = {2024},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-250-9},
DOI = {10.2312/pg.20241309}
}