An embarrassingly simple approach to zero-shot learning

Authors: Bernardino Romera-Paredes, Philip Torr

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments carried out on three standard real datasets, we found that our approach is able to perform significantly better than the state of art on all of them, obtaining a ratio of improvement up to 17%. In order to assess our approach and the validity of the statements we made, we conducted a set of experiments on one synthetic and three real datasets, which comprise a standard benchmark of evaluation of zero-shot learning methods
Researcher Affiliation Academia Bernardino Romera-Paredes BERNARDINO.ROMERAPAREDES@ENG.OX.AC.UK University of Oxford, Department of Engineering Science, Parks Road, Oxford, OX1 3PJ, UK Philip H. S. Torr PHILIP.TORR@ENG.OX.AC.UK University of Oxford, Department of Engineering Science, Parks Road, Oxford, OX1 3PJ, UK
Pseudocode No This, and the corresponding kernel version that can be derived from eq. (3), are the one-line-of-code solutions we mentioned in the introduction. The paper provides a mathematical equation that can be implemented in one line of code, but it does not present it in a pseudocode or algorithm block format.
Open Source Code Yes 1The code can be found at romera-paredes.com/zsl.
Open Datasets Yes We have tried the same real datasets as the ones reported in (Jayaraman & Grauman, 2014) which are the Animals with Attributes dataset (Aw A) (Lampert et al., 2009), the SUN scene attributes database (SUN) (Patterson & Hays, 2012), and the a Pascal/a Yahoo objects dataset (a PY) (Farhadi et al., 2009).
Dataset Splits Yes We create the validation set by grouping all instances belonging to 20% of the training classes chosen at random (without replacement).
Hardware Specification No It is also worth mentioning that the latter approach took more than 11 hours to run the scenario with 2000 training instances, whereas ours only took 4.12 seconds. No specific hardware details (like CPU/GPU models or memory) are provided.
Software Dependencies No We used the recently provided DECAF features of the Aw A dataset. including SIFT (Lowe, 2004), and PHOG (Bosch et al., 2007). We used combined χ2-kernels, one for each feature channel2. The paper mentions features and kernels used, but does not provide specific software dependencies with version numbers.
Experiment Setup Yes All hyper-parameters required by these methods were tuned by a validation process using in all cases the range of values 10b, for b = 6, 5, . . . , 5, 6. We use the range of values, 10b for b = 3, 2, . . . , 2, 3 to tune all hyper-parameters.