Integrated perception with recurrent multi-task neural networks

Authors: Hakan Bilen, Andrea Vedaldi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical evaluation in sect. 4 demonstrates the benefits of the approach, including that sharing features between different tasks is not only economical, but also sometimes better for accuracy, and that integrating the outputs of different tasks in the shared representation yields further accuracy improvements. 4 Experiments
Researcher Affiliation Academia Hakan Bilen Andrea Vedaldi Visual Geometry Group, University of Oxford {hbilen,vedaldi}@robots.ox.ac.uk
Pseudocode No No explicitly labeled pseudocode or algorithm blocks were found. The paper describes the architecture and processes using natural language and mathematical equations.
Open Source Code No No explicit statement providing access to the authors' own source code for their methodology was found. The paper mentions using the 'publicly available CNN toolbox Mat Conv Net [27]'.
Open Datasets Yes PASCAL VOC 2010 [10] and Parts [7]: The dataset contains 4998 training and 5105 validation images for 20 object categories and ground truth bounding box annotations for target categories. We use the PASCAL-Part dataset [7] to obtain bounding box annotations of object parts... PASCAL VOC 2007 [10]: The dataset consists of 2501 training, 2510 validation, and 5011 test images containing bounding box annotations for 20 object categories.
Dataset Splits Yes The dataset contains 4998 training and 5105 validation images for 20 object categories and ground truth bounding box annotations for target categories. The dataset consists of 2501 training, 2510 validation, and 5011 test images containing bounding box annotations for 20 object categories.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were provided. The paper mentions using a 'VGG-M-1024 network' and general training parameters but no hardware specs.
Software Dependencies No The paper mentions using 'the publicly available CNN toolbox Mat Conv Net [27]' but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes The image encoder φimg enc is initialized from the pre-trained VGG-M model using sections conv1 to conv5. Max pooling in SPP is performed in a grid of 6 6 spatial bins as in [14, 11]. The fully connected layers used for softmax classification and bounding-box regression in object and part detection tasks are initialized from zero-mean Gaussian distributions with 0.01 and 0.001 standard deviations respectively. All layers use a learning rate of 1 for filters and 2 for biases. We used SGD to optimize the parameters with a learning rate of 0.001 for 6 epochs and lower it to 0.0001 for another 6 epochs.