Geometric Scene Parsing with Hierarchical LSTM

Authors: Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments show that our model is capable of parsing scene geometric structures and outperforming several state-of-the-art methods by large margins.
Researcher Affiliation Academia 1Sun Yat-sen University, Guangzhou, China 2San Diego State University, U.S.
Pseudocode No The paper includes mathematical equations for the LSTM unit but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any explicit statements or links indicating that open-source code for the described methodology is available.
Open Datasets Yes We validate the effectiveness of the proposed H-LSTM on three public datasets, including SIFT-Flow dataset [Liu et al., 2011a], LM+SUN dataset [Tighe and Lazebnik, 2013] and Geometric Context dataset [Hoiem et al., 2007].
Dataset Splits No The paper specifies training and testing sets but does not explicitly mention or detail a separate validation set for model tuning during training. For example, 'The SIFT-Flow consists of 2,488 training images and 200 testing images.' and 'Following [Tighe and Lazebnik, 2013], we apply 45,176 images as training data and 500 images as test ones.'
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions 'pre-training CNN model'.
Software Dependencies No The paper mentions using a 'modified VGG-16 model' but does not specify any software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes The scale of input image is fixed as 321x321 for LM+SUN and Geometric Context datasets. During the training phase, the learning rates of transition layer, P-LSTM layers and MS-LSTM layers are initialized as 0.001 and that of pre-training CNN model is initialized as 0.0001. The dimension of hidden cells and memory cells, which is corresponding to the symbol d in Sec. 3, is set as 64 in both P-LSTM and MS-LSTM.