Improved Visual-Semantic Alignment for Zero-Shot Object Detection
Authors: Shafin Rahman, Salman Khan, Nick Barnes11932-11939
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive results on MS-COCO and Pascal VOC datasets show significant improvements over state of the art. |
| Researcher Affiliation | Collaboration | Shafin Rahman,1,2 Salman Khan,3,1 Nick Barnes1,2 1College of Engineering and Computer Science, Australian National University 2Data61, Commonwealth Scientific and Industrial Research Organisation 3Inception Institute of Artificial Intelligence, Abu Dhabi, UAE |
| Pseudocode | No | The paper describes mathematical formulations and network architectures but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and evaluation protocols available at: https://github.com/salman-h-khan/PL-ZSD Release |
| Open Datasets | Yes | We evaluate our method with MS-COCO (2014) (Lin et al. 2014) and Pascal VOC (2007/12) (Everingham et al. 2010). |
| Dataset Splits | Yes | With 80 object classes, MS-COCO includes 82,783 training and 40,504 validation images. For the ZSD task, only unseen class performance is of interest. As the test data labels are not known, the ZSD evaluation is done on a subset of validation data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using Retina Net (Lin et al. 2018) as the base architecture but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We train the classification subnet branch with our proposed loss defined in Eq. 6. Similar to (Lin et al. 2018), to address the imbalance between hard and easy examples, we normalize the total classification loss (calculated from 100k anchors) by the total number of object/positive anchor boxes rather than the total number of anchors. We use standard smooth L1 loss for the box-regression subnet branch. The total loss is the sum of the loss of both branches. Hyper-parameters are set on the validation set: β=5, Io U=0.5. Our model works best with α=.25 and γ=2.0 which are also the recommended values in FL. Empirically, we found ts=0.3 and tu=0.1 generally work well. |