Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning
Authors: Chenchen Jing, Yukun Li, Hao Chen, Chunhua Shen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three widely-used datasets show the effectiveness of the proposed method. |
| Researcher Affiliation | Academia | 1 Zhejiang University, China 2 School of Computer Science and Ningbo Institute, Northwestern Polytechnical University, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements about code availability or links to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate the proposed method on three CZSL datasets, i.e., MIT-States (Isola, Lim, and Adelson 2015), UT-Zappos (Yu and Grauman 2014), and C-GQA (Naeem et al. 2021). |
| Dataset Splits | Yes | The MIT-States... The dataset contains 1, 262 seen and 300/400 unseen compositions for training and validation/testing, respectively. The UT-Zappos... The dataset is split into 83 seen and 15/18 unseen compositions for training and validation/testing. The C-GQA... The dataset is split into 5, 592 seen and 1, 040/923 unseen compositions for training and validation/testing, respectively. The detailed dataset statistics are shown in Table 2. |
| Hardware Specification | Yes | A single NVIDIA RTX 3090 GPU is used for training and testing. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for it or any other software libraries used. |
| Experiment Setup | Yes | For the UT-Zappos, the hyperparameters λ1, λ2, and λ3 in the losses are set as 0.8, 5.0, and 1.0. For the MIT-States, the three hyper-parameters are set as 0.2, 1.0, and 0.1. For the C-GQA, the three hyperparameters are set as 0.2, 5.0, and 0.1. The number of retrieved images K is set as 32 for UT-Zappos and 16 for both MIT-States and C-GQA. The number of images ND of each primitive in database construction is set as 128 for the UTZappos and 16 for both the MIT-States and the C-GQA, considering the classes in the MIT-States and the C-GQA are much more than classes of the UT-Zappos. The number of selected images M for the retrieval loss is is set as 256 for the UT-Zappos and 512 for both the MIT-States and the CGQA. The weight of aggregated features β is set as 0.8 for UT-Zappos and 0.5 for both MIT-States and C-GQA. We set the training epochs of each dataset as 20. |