PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Authors: Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on SUN RGB-D dataset show that the proposed method significantly outperforms existing RGB-based approaches for 3D object detection.
Researcher Affiliation Academia Siyuan Huang Department of Statistics huangsiyuan@ucla.edu Yixin Chen Department of Statistics ethanchen@ucla.edu Tao Yuan Department of Statistics taoyuan@ucla.edu Siyuan Qi Department of Computer Science syqi@cs.ucla.edu Yixin Zhu Department of Statistics yixin.zhu@ucla.edu Song-Chun Zhu Department of Statistics sczhu@stat.ucla.edu
Pseudocode No The paper describes algorithms and formulations in text and mathematical equations but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: 'We implement our framework based on the code of Massa and Girshick [83].' This indicates they used an existing framework, but there is no explicit statement that their specific implementation of Perspective Net is open-source or a link to its repository.
Open Datasets Yes We conduct comprehensive experiments on SUN RGB-D [46] dataset.
Dataset Splits No The paper mentions 4783 training images and 4220 test images but does not explicitly describe a validation dataset split or its size.
Hardware Specification Yes We use SGD for optimization with a batch size of 32 on a desktop with 4 Nvidia TITAN RTX cards (8 images each card).
Software Dependencies No The paper states, 'We implement our framework based on the code of Massa and Girshick [83].' While this implies the use of PyTorch (as indicated by the reference title 'maskrcnn-benchmark...in PyTorch'), it does not specify exact version numbers for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes We resize the images so that the shorter edges are all 800 pixels. To avoid over-fitting, a data augmentation procedure is performed by randomly flipping the images or randomly shifting the 2D bounding boxes with corresponding labels during the training. We use SGD for optimization with a batch size of 32 [...] The learning rate starts at 0.01 and decays by 0.1 at 30,000 and 35,000 iterations.