Geometric Scene Parsing with Hierarchical LSTM
Authors: Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments show that our model is capable of parsing scene geometric structures and outperforming several state-of-the-art methods by large margins. |
| Researcher Affiliation | Academia | 1Sun Yat-sen University, Guangzhou, China 2San Diego State University, U.S. |
| Pseudocode | No | The paper includes mathematical equations for the LSTM unit but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that open-source code for the described methodology is available. |
| Open Datasets | Yes | We validate the effectiveness of the proposed H-LSTM on three public datasets, including SIFT-Flow dataset [Liu et al., 2011a], LM+SUN dataset [Tighe and Lazebnik, 2013] and Geometric Context dataset [Hoiem et al., 2007]. |
| Dataset Splits | No | The paper specifies training and testing sets but does not explicitly mention or detail a separate validation set for model tuning during training. For example, 'The SIFT-Flow consists of 2,488 training images and 200 testing images.' and 'Following [Tighe and Lazebnik, 2013], we apply 45,176 images as training data and 500 images as test ones.' |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions 'pre-training CNN model'. |
| Software Dependencies | No | The paper mentions using a 'modified VGG-16 model' but does not specify any software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | The scale of input image is fixed as 321x321 for LM+SUN and Geometric Context datasets. During the training phase, the learning rates of transition layer, P-LSTM layers and MS-LSTM layers are initialized as 0.001 and that of pre-training CNN model is initialized as 0.0001. The dimension of hidden cells and memory cells, which is corresponding to the symbol d in Sec. 3, is set as 64 in both P-LSTM and MS-LSTM. |