Beyond Grids: Learning Graph Representations for Visual Recognition
Authors: Yin Li, Abhinav Gupta
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on several challenging visual recognition tasks, including semantic segmentation, object detection and object instance segmentation. For all tasks, our method outperforms state-of-the-art methods. We present our experiments and discuss the results in this section. We test our method on three important recognition tasks: (1) semantic segmentation, (2) object detection and object instance segmentation. |
| Researcher Affiliation | Academia | Yin Li Department of Biostatistics & Medical Informatics Department of Computer Sciences University of Wisconsin Madison yin.li@wisc.edu Abhinav Gupta The Robotics Institute School of Computer Science Carnegie Mellon University abhinavg@cs.cmu.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'Detectron' and provides its GitHub link (https: //github.com/facebookresearch/detectron) as a baseline upon which their method is built. However, it does not explicitly state that the authors are releasing their own code for the methodology described in this paper or provide a direct link to their implementation. |
| Open Datasets | Yes | We use ADE20K dataset [37] for semantic segmentation. For both object detection and instance segmentation, we use COCO dataset from [43]. |
| Dataset Splits | Yes | We follow the same evaluation protocol as [13] and train our method on the 20K training set. We report the pixel level accuracy and mean Intersection over Union (m Io U) on the 2K validation set. We train using the union of 80k train images and a 35k subset of val images (trainval35k), and report results on the remaining 5k val images (minival). |
| Hardware Specification | No | The paper mentions running experiments 'across 4 GPUs' but does not specify the make, model, or any other specific details about the hardware used (e.g., CPU, memory, specific GPU type). |
| Software Dependencies | No | The paper mentions various frameworks and models like 'Res Net 50/101', 'Mask RCNN', 'Dilated FCN', 'PSPNet', and 'Enc Net'. However, it does not provide specific version numbers for any software dependencies such as Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | Our base model attaches 4 GCUs to the last block of a backbone network and concatenates their outputs, followed by convolutions for pixel labeling. These GCUs have (2, 4, 8, 32) vertices and output dimensions of d = 256. We use Res Net 50/101 [38] pre-trained on Image Net [39] as our backbone network. We add dilation to the last two residual blocks, thus the output is down-sampled by a factor of 8. We upsample the result to original resolution using bilinear interpolation. We crop the image into a fixed size (505x505) with data augmentations (random flip, rotation, scale) and train for 120 epochs. We also add an auxiliary loss after the 4th residual block with a weight of 0.4. The network is trained using SGD with batch size 16 (across 4 GPUs), learning rate 0.01 and momentum 0.9. We also adapt the power decay for learning rate schedule [40], and enable synchronized batch normalization. For object detection and instance segmentation, we train the model using SGD with a batch size of 8 across 4 GPUs. Following the training schedule (x1) in [44], we linearly scale the training iterations (180K) and initial learning rate (0.01) based on our batch size. The learning rate is decreased by 10 at 120/160K iterations. We also freeze the batch normalization layers. |