I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
Authors: Muhammad Ferjad Naeem, Yongqin Xian, Luc V Gool, Federico Tombari
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitatively, we demonstrate that our I2DFormer significantly outperforms previous unsupervised semantic embeddings under both zero-shot and generalized zero-shot learning settings on three public datasets.We conduct extensive experiments on Animals with Attributes2 (AWA2) [57], Caltech-UCSD Birds (CUB) [51] and Oxford Flowers (FLO) [32], which are widely used datasets in ZSL. |
| Researcher Affiliation | Collaboration | Muhammad Ferjad Naeem1 Yongqin Xian1 Luc Van Gool1 Federico Tombari2,3 1 ETH Zürich 2 TUM 3Google |
| Pseudocode | No | The paper presents architectural diagrams and describes methods in text, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/ferjad/I2DFormer. |
| Open Datasets | Yes | We conduct extensive experiments on Animals with Attributes2 (AWA2) [57], Caltech-UCSD Birds (CUB) [51] and Oxford Flowers (FLO) [32], which are widely used datasets in ZSL. |
| Dataset Splits | Yes | We follow the evaluation protocol and data splits proposed by Xian et al. [57]. |
| Hardware Specification | Yes | We implement our model in Py Torch and train on an Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | The model is trained with Adam optimizer with a learning rate of 1e 3 and takes 24 hours to converge. LCLS and Llocal relative weights are chosen by ablation. More details are available in the supplementary. |