Convolutional Neural Networks with Intra-Layer Recurrent Connections for Scene Labeling
Authors: Ming Liang, Xiaolin Hu, Bo Zhang
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Over two benchmark datasets, Standford Background and Sift Flow, the model outperforms many state-of-the-art models in accuracy and efficiency. Experiments are performed over two benchmark datasets for scene labeling, Sift Flow [15] and Stanford Background [6]. |
| Researcher Affiliation | Academia | Ming Liang Xiaolin Hu Bo Zhang Tsinghua National Laboratory for Information Science and Technology (TNList) Department of Computer Science and Technology Center for Brain-Inspired Computing Research (CBICR) Tsinghua University, Beijing 100084, China liangm07@mails.tsinghua.edu.cn, {xlhu,dcszb}@tsinghua.edu.cn |
| Pseudocode | No | The paper provides mathematical equations and descriptive text of the model and training process, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'The models are implemented using Caffe [10]' but does not provide a link to its own open-source code or explicitly state its release. |
| Open Datasets | Yes | Experiments are performed over two benchmark datasets for scene labeling, Sift Flow [15] and Stanford Background [6]. |
| Dataset Splits | Yes | For the Sift Flow dataset, the hyper-parameters are determined on a separate validation set. The Sift Flow dataset contains 2688 color images, all of which have the size of 256 256 pixels. Among them 2488 images are training data, and the remaining 200 images are testing data. The Stanford background dataset contains 715 color images, most of them have the size of 320 240 pixels. Following [6] 5-fold cross validation is used over this dataset. In each fold there are 572 training images and 143 testing images. |
| Hardware Specification | Yes | On a GTX Titan black GPU, it takes about 0.03 second for the RCNN and 0.02 second for the RCNN-small to process an image. |
| Software Dependencies | No | The models are implemented using Caffe [10]. This statement names the software but does not provide a specific version number for Caffe or any other dependencies. |
| Experiment Setup | Yes | The numbers of feature maps in these layers are 32, 64 and 128. The filter size in the first convolutional layer is 7 7, and the feed-forward and recurrent filters in RCLs are all 3 3. The dropout ratio is 0.5 and weight decay coefficient is 0.0001. The base learning rate is 0.001, which is reduced to 0.0001 when the training error enters a plateau. Overall, about ten millions patches have been input to the model during training. |