I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

Authors: Muhammad Ferjad Naeem, Yongqin Xian, Luc V Gool, Federico Tombari

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitatively, we demonstrate that our I2DFormer significantly outperforms previous unsupervised semantic embeddings under both zero-shot and generalized zero-shot learning settings on three public datasets.We conduct extensive experiments on Animals with Attributes2 (AWA2) [57], Caltech-UCSD Birds (CUB) [51] and Oxford Flowers (FLO) [32], which are widely used datasets in ZSL.
Researcher Affiliation Collaboration Muhammad Ferjad Naeem1 Yongqin Xian1 Luc Van Gool1 Federico Tombari2,3 1 ETH Zürich 2 TUM 3Google
Pseudocode No The paper presents architectural diagrams and describes methods in text, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code available at https://github.com/ferjad/I2DFormer.
Open Datasets Yes We conduct extensive experiments on Animals with Attributes2 (AWA2) [57], Caltech-UCSD Birds (CUB) [51] and Oxford Flowers (FLO) [32], which are widely used datasets in ZSL.
Dataset Splits Yes We follow the evaluation protocol and data splits proposed by Xian et al. [57].
Hardware Specification Yes We implement our model in Py Torch and train on an Nvidia A100 GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes The model is trained with Adam optimizer with a learning rate of 1e 3 and takes 24 hours to converge. LCLS and Llocal relative weights are chosen by ablation. More details are available in the supplementary.