Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning

Authors: Chenchen Jing, Yukun Li, Hao Chen, Chunhua Shen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three widely-used datasets show the effectiveness of the proposed method.
Researcher Affiliation Academia 1 Zhejiang University, China 2 School of Computer Science and Ningbo Institute, Northwestern Polytechnical University, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit statements about code availability or links to a code repository for the described methodology.
Open Datasets Yes We evaluate the proposed method on three CZSL datasets, i.e., MIT-States (Isola, Lim, and Adelson 2015), UT-Zappos (Yu and Grauman 2014), and C-GQA (Naeem et al. 2021).
Dataset Splits Yes The MIT-States... The dataset contains 1, 262 seen and 300/400 unseen compositions for training and validation/testing, respectively. The UT-Zappos... The dataset is split into 83 seen and 15/18 unseen compositions for training and validation/testing. The C-GQA... The dataset is split into 5, 592 seen and 1, 040/923 unseen compositions for training and validation/testing, respectively. The detailed dataset statistics are shown in Table 2.
Hardware Specification Yes A single NVIDIA RTX 3090 GPU is used for training and testing.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for it or any other software libraries used.
Experiment Setup Yes For the UT-Zappos, the hyperparameters λ1, λ2, and λ3 in the losses are set as 0.8, 5.0, and 1.0. For the MIT-States, the three hyper-parameters are set as 0.2, 1.0, and 0.1. For the C-GQA, the three hyperparameters are set as 0.2, 5.0, and 0.1. The number of retrieved images K is set as 32 for UT-Zappos and 16 for both MIT-States and C-GQA. The number of images ND of each primitive in database construction is set as 128 for the UTZappos and 16 for both the MIT-States and the C-GQA, considering the classes in the MIT-States and the C-GQA are much more than classes of the UT-Zappos. The number of selected images M for the retrieval loss is is set as 256 for the UT-Zappos and 512 for both the MIT-States and the CGQA. The weight of aggregated features β is set as 0.8 for UT-Zappos and 0.5 for both MIT-States and C-GQA. We set the training epochs of each dataset as 20.