Deep Generative Stochastic Networks Trainable by Backprop
Authors: Yoshua Bengio, Eric Laufer, Guillaume Alain, Jason Yosinski
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate these theoretical results with experiments on two image datasets using an architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with simple backprop, without the need for layerwise pretraining.In Section 4 we show an example application of the GSN theory to create a deep GSN whose computational graph resembles the one followed by Gibbs sampling in deep Boltzmann machines (with continuous latent variables), but that can be trained efficiently with back-propagated gradients and without layerwise pretraining. |
| Researcher Affiliation | Academia | Yoshua Bengio FIND.US@ON.THE.WEB Eric Thibodeau-Laufer Guillaume Alain D epartement d informatique et recherche op erationnelle, Universit e de Montr eal, & Canadian Inst. for Advanced Research Jason Yosinski Department of Computer Science, Cornell University |
| Pseudocode | No | The paper describes the proposed framework and its components in prose and with graphical models (e.g., Figure 2), but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper mentions 'The supplemental material provides full details on the experiments and more detailed figures of generated samples,' but it does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The experiments were performed on the MNIST and Toronto Face Database (TFD) datasets, following the setup in Bengio et al. (2013b), where the model generates quantized (binary) pixels. Susskind, Joshua, Anderson, Adam, and Hinton, Geoffrey E. The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto, 2010. |
| Dataset Splits | No | The paper mentions a 'Test set' in the context of evaluation (Table 1), and discusses training on MNIST and TFD datasets, but it does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or a detailed splitting methodology). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper describes the theoretical framework and experimental setup, but it does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions). |
| Experiment Setup | Yes | To emulate a sampling procedure similar to Boltzmann machines in which the filled-in missing values can depend on the representations at the top level, the computational graph allows information to propagate both upwards (from input to higher levels) and downwards, giving rise to the computational graph structure illustrated in Figure 2, which is similar to that explored for deterministic recurrent auto-encoders (Seung, 1998; Behnke, 2001; Savard, 2011). Downward weight matrices have been fixed to the transpose of corresponding upward weight matrices. Here we consider the following stochastic non-linearities: hi = ηout +tanh(ηin +ai) where ai is the linear activation for unit i (an affine transformation applied to the input of the unit, coming from the layer below, the layer above, or both) and ηin and ηout are zero-mean Gaussian noises. In the experiments, the graph was unfolded so that 2D sampled reconstructions would be produced, where D is the depth (number of hidden layers). The training loss is the sum of the reconstruction negative log-likelihoods (of target X) over all those reconstruction steps. |