Context-Aware Zero-Shot Recognition
Authors: Ruotian Luo, Ning Zhang, Bohyung Han, Linjie Yang11709-11716
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed algorithm is evaluated on both zero-shot region classification and zero-shot detection tasks. The results on Visual Genome (VG) dataset show that our model significantly boosts performance with the additional visual context compared to traditional methods. |
| Researcher Affiliation | Collaboration | Ruotian Luo TTI-Chicago rluo@ttic.edu Ning Zhang Vaitl Inc. ning@vaitl.ai Bohyung Han Seoul National University bhhan@snu.ac.kr Linjie Yang Byte Dance AI Lab linjie.yang@bytedance.com |
| Pseudocode | No | No pseudocode or algorithm blocks are present. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the release of their specific source code. It only references a PyTorch Mask/Faster R-CNN implementation on GitHub, but this is a backbone, not their specific code. |
| Open Datasets | Yes | Our algorithm is evaluated on Visual Genome dataset (Krishna et al. 2017), which provides a large number of object categories and diverse object relations; our model based on the proposed context knowledge representation illustrates the clear advantage when applied to various existing methods for zero-shot recognition. We evaluate our method on Visual Genome (VG) dataset (Krishna et al. 2017). |
| Dataset Splits | Yes | The part-1 of VG dataset are used for training, and randomly sampled images from part-2 are used for test. This results in 54,913 training images and 7,788 test images. The model is only asked to predict among the unseen categories at test time in the classic setting while it needs to consider both seen and unseen categories under generalized setting. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running the experiments. It only mentions using a PyTorch Mask/Faster R-CNN implementation. |
| Software Dependencies | No | The paper mentions using a "PyTorch Mask/Faster R-CNN (He et al. 2017) implementation" and "Res Net-50 (He et al. 2016) as our backbone model" but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use a SGD with momentum to optimize all the modules. The instance-level zero-shot inference and relationship inference modules are trained separately in two stages. In the first stage, we train the instance-level zero-shot module on seen categories for 100K iterations. The model is fine-tuned from the pretrained Image Net classification model. The learning rate is initialized to 0.005 and reduced by 10 after 60K and 80K iterations. In the second stage, we train the relationship inference module for another 60k iterations with all the other modules fixed. The learning rate is also initialized to 0.005 and reduced by 10 after 20K and 40K iterations. For all the modules, the parameter for the weight decay term is set to 0.0001, and the momentum is 0.9. The batch size is set to 8, and the batch normalization layers are fixed during training. For WE, CONSE, GCN, γ is 1, and for SYNC, γ is set to be 0.5. |