Generative Image Modeling Using Spatial LSTMs
Authors: Lucas Theis, Matthias Bethge
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We here introduce a recurrent image model based on multidimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting. |
| Researcher Affiliation | Academia | Lucas Theis University of T ubingen 72076 T ubingen, Germany lucas@bethgelab.org Matthias Bethge University of T ubingen 72076 T ubingen, Germany matthias@bethgelab.org |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about making its code open source or links to code repositories for the described methodology. |
| Open Datasets | Yes | Several recent image models have been evaluated on small image patches sampled from the Berkeley segmentation dataset (BSDS300) [25]. Another dataset frequently used to test generative image models is the dataset published by van Hateren and van der Schaaf [48]. We used a set of 1,000 images, where each image is 256 by 256 pixels in size. We compare the performance of RIDE to the MCGSM and a very recently introduced deep multiscale model based on a diffusion process [35]. The same 100 images as in previous literature [35, 41] were used for evaluation and we used the remaining images for training. We used several 640 by 640 pixel textures published by Brodatz [2]. |
| Dataset Splits | Yes | The training set of 200 images was split into 180 images for training and 20 images for validation, while the test set contained 100 images. We used 1.6 106 patches for training, 1.8 105 patches for validation, and 106 test patches for evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | Spatial LSTMs were implemented using the Caffe framework [17]. The paper mentions Caffe but does not specify its version number. |
| Experiment Setup | Yes | RIDE was trained using stochastic gradient descent with a batch size of 50, momentum of 0.9, and a decreasing learning rate varying between 1 and 10 4. After each pass through the training set, the MCGSM of RIDE was finetuned using L-BFGS for up to 500 iterations before decreasing the learning rate. No regularization was used except for early stopping based on a validation set. Except where indicated otherwise, the recurrent model used a 5 pixel wide neighborhood and an MCGSM with 32 components and 32 quadratic features (bn in Section 2.1). |