A Dynamic Learning Method towards Realistic Compositional Zero-Shot Learning
Authors: Xiaoming Hu, Zilei Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on benchmark datasets for both the conventional CZSL setting and the proposed RCZSL setting. The effectiveness of our method has been proven by empirical results, which significantly outperformed both our baseline method and state-of-the-art approaches. |
| Researcher Affiliation | Academia | Xiaoming Hu, Zilei Wang* University of Science and Technology of China, Hefei, China cjdc@mail.ustc.edu.cn, zlwang@ustc.edu.cn |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 2) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for their method, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on three benchmark CZSL datasets, i.e., MIT-States (Isola, Lim, and Adelson 2015), UT-Zappos (Yu and Grauman 2014) and C-GQA (Naeem et al. 2021). [...] Therefore we developed the RCZSL benchmark dataset based on the MIT-States, in which the natural images included could better reflect the real-world circumstances. |
| Dataset Splits | Yes | Then the entire dataset is split into a training set and a test set, with the latter containing images of unseen concepts, unseen compositions, unseen domains as well as their combinations. [...] We divide the the entire dataset of images into the training, validation and test set [...] Detailed description of the division of MIT-States-RCZSL can be found in Table 2. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | We conduct our method with the Py Torch (Paszke et al. 2019) framework. [...] The model is trained for 50 epochs using the Adam (Kingma and Ba 2014) optimizer. [...] we apply the Cycle GAN model (Zhu et al. 2017) to conduct photo-to-art transfer, which is pre-trained using adversarial and cycle-consistency losses. Additionally, we perform photo-to-cartoon transfer using an another GAN model (Wang and Yu 2020). The paper mentions software tools and frameworks but does not provide specific version numbers for them (e.g., PyTorch version, specific GAN model versions). |
| Experiment Setup | Yes | The model is trained for 50 epochs using the Adam (Kingma and Ba 2014) optimizer with learning rate of 1e 4 and weight decay of 5e 5. The number of dynamic kernels in the visual modulator is set as 4. The visual encoder and text encoder in CLIP and the Res Net18 backbone are fixed during the experiment for fair compasision with prior works. The model that performs the best on the validation set is used to produce the final test results. |