Object detectors emerge in Deep Scene CNNs
Authors: Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Current deep neural networks achieve remarkable performance at a number of vision tasks surpassing techniques based on hand-crafted features. However, while the structure of the representation in hand-crafted features is often clear and interpretable, in the case of deep networks it remains unclear what the nature of the learned representation is and why it works so well. A convolutional neural network (CNN) trained on Image Net (Deng et al., 2009) significantly outperforms the best hand crafted features on the Image Net challenge (Russakovsky et al., 2014). |
| Researcher Affiliation | Academia | Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Computer Science and Artificial Intelligence Laboratory, MIT {bolei,khosla,agata,oliva,torralba}@mit.edu |
| Pseudocode | No | The paper describes procedures and uses diagrams (e.g., Fig. 3 for the RF estimation pipeline) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions "Caffe: An open source convolutional architecture for fast feature embedding, 2013." by Jia (2013), which is a tool used, not the authors' own implementation code. It also states "Scene recognition demo of Places-CNN is available at http://places.csail.mit.edu/demo. html.", which refers to a live demo, not the source code for the methodology described in the paper. |
| Open Datasets | Yes | The Image Net-CNN from Jia (2013) is trained on 1.3 million images from 1000 object categories of Image Net (ILSVRC 2012) and achieves a top-1 accuracy of 57.4%. [...] With the same network architecture, Places-CNN is trained on 2.4 million images from 205 scene categories of Places Database (Zhou et al., 2014), and achieves a top-1 accuracy of 50.0%. [...] The SUN database contains 8220 fully annotated images from the same 205 place categories used to train Places-CNN. |
| Dataset Splits | No | The paper mentions using ImageNet, Places Database, and SUN dataset, which are standard, but it does not explicitly state the specific train/validation/test splits used for their experiments (e.g., 80/10/10 split, or specific sample counts for each split). |
| Hardware Specification | No | The acknowledgments section mentions "a hardware donation from NVIDIA Corporation" but does not specify any particular GPU model (e.g., RTX 3090, A100) or other specific hardware components (CPU model, RAM, etc.) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Caffe" and implies the use of Python for analysis (given the demo link), but it does not provide specific version numbers for any software libraries, frameworks, or solvers used in their experiments. |
| Experiment Setup | No | The paper specifies the network architecture in Table 1 and states that networks were "trained from scratch using only the specified dataset." However, it does not provide explicit details on hyperparameters such as learning rate, batch size, number of epochs, or the specific optimizer used (e.g., SGD with momentum, Adam settings). |