Zero-Shot Sketch Based Image Retrieval via Modality Capacity Guidance

Authors: Yanghong Zhou, Dawei Liu, P. Y. Mok

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiment results have demonstrated our significant performance improvements, achieving an increase of 7.3%/3.2% and 19.9%/10.3% in terms of m AP@200/P@200 compared to the state-of-the-art models on CLIP and DINO, respectively, on the Sketchy-ext dataset (split 2).
Researcher Affiliation Academia 1School of Fashion and Textiles, The Hong Kong Polytechnic University 2Research Institute for Intelligent Wearable Systems, The Hong Kong Polytechnic University 3Research Centre of Textiles for Future Fashion, The Hong Kong Polytechnic University
Pseudocode No The paper provides mathematical equations for loss functions but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Data, code, and supplementary information are available at https: //github.com/YHdian0716/ZS-SBIR-MCC.git
Open Datasets Yes We evaluated the effectiveness of our proposed modality capacity constraint loss on there widely-used benchmarks: Sketchy-Ext [Liu et al., 2017], TUBerlin-Ext [Eitz et al., 2012] and a subset of Quick Draw-Ext [Dey et al., 2019a].
Dataset Splits Yes For data partitioning, we also followed [Dey et al., 2019b] to divide Sketch Ext. [Liu et al., 2017] into 100/104 categories for training and 25/21 categories for testing, which are denoted as Sketchy Ext Split 1 and Sketchy Ext Split 2 , respectively. We utilized 25 categories from Sketch Ext. [Liu et al., 2017] and 30 categories from TUBerlin-Ext [Eitz et al., 2012] for testing and utilized the rest 100/220 categories for training.
Hardware Specification Yes All the experiments were conducted on Pytorch with 11GB Nvidia RTX 3080-Ti GPU.
Software Dependencies No The paper mentions "Pytorch" but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We used Adam optimizer to train the models with learning rates of lr = 1e 4 , β1 = 0.9 and β2 = 0.999. The input size of the images was 224 224. The models were trained for 60 epochs with a batch size of 64. During the training stage, all the parameters of the models were frozen except for the layer normalization. The loss weights were set as λ1 = λ4 = 1, λ2 = λ5 = 4 and λ3 = λ6 = 8. The margin µ was set as 0.3.