LSDA: Large Scale Detection through Adaptation

Authors: Judy Hoffman, Sergio Guadarrama, Eric S Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation on the Image Net LSVRC-2013 detection challenge demonstrates the efficacy of our approach. This algorithm enables us to produce a >7.6K detector by using available classification data from leaf nodes in the Image Net tree. We additionally demonstrate how to modify our architecture to produce a fast detector (running at 2fps for the 7.6K detector).
Researcher Affiliation Academia Judy Hoffman , Sergio Guadarrama , Eric Tzeng , Ronghang Hu , Jeff Donahue , EECS, UC Berkeley, EE, Tsinghua University {jhoffman, sguada, tzeng, jdonahue}@eecs.berkeley.edu hrh11@mails.tsinghua.edu.cn Ross Girshick , Trevor Darrell , Kate Saenko EECS, UC Berkeley, CS, UMass Lowell {rbg, trevor}@eecs.berkeley.edu, saenko@cs.uml.edu
Pseudocode No The paper describes the algorithm in text and uses figures to illustrate network components, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Models and software are available at lsda.berkeleyvision.org. We implemented our CNN architectures and execute all fine-tuning using the open source software package Caffe [24] and have made our model definitions weights publicly available. We have released the 7.6K model and code to run detection (both the way presented in this paper and our faster version) at lsda.berkeleyvision.org.
Open Datasets Yes Evaluation on the Image Net LSVRC-2013 detection challenge demonstrates the efficacy of our approach. We start by pre-training the CNN trained on the ILSVRC2012 classification dataset, which contains 1.2 million classification-labeled images of 1000 categories.
Dataset Splits Yes The training set has 400K annotated images and on average 1.534 object classes per image. The validation set has 20K annotated images with 50K annotated objects. Next, we split the ILSVRC2013 validation set in half as [1] did, producing two sets: val1 and val2.
Hardware Specification No The paper mentions detection speeds ('running at 2fps for the 7.6K detector', 'reducing our detection time down to half a second per image') but does not specify any hardware details such as GPU/CPU models or specific computing platforms used for experiments.
Software Dependencies No We implemented our CNN architectures and execute all fine-tuning using the open source software package Caffe [24] and have made our model definitions weights publicly available.
Experiment Setup No The paper describes the general training process including pre-training, fine-tuning, and layer modifications. However, it does not provide specific numerical values for hyperparameters such as learning rate, batch size, or number of epochs in the main text.