reproducibilityindex.ai

Deep Generative Stochastic Networks Trainable by Backprop

Authors: Yoshua Bengio, Eric Laufer, Guillaume Alain, Jason Yosinski

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate these theoretical results with experiments on two image datasets using an architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with simple backprop, without the need for layerwise pretraining.In Section 4 we show an example application of the GSN theory to create a deep GSN whose computational graph resembles the one followed by Gibbs sampling in deep Boltzmann machines (with continuous latent variables), but that can be trained efﬁciently with back-propagated gradients and without layerwise pretraining.
Researcher Affiliation	Academia	Yoshua Bengio FIND.US@ON.THE.WEB Eric Thibodeau-Laufer Guillaume Alain D epartement d informatique et recherche op erationnelle, Universit e de Montr eal, & Canadian Inst. for Advanced Research Jason Yosinski Department of Computer Science, Cornell University
Pseudocode	No	The paper describes the proposed framework and its components in prose and with graphical models (e.g., Figure 2), but it does not include a formal pseudocode block or algorithm listing.
Open Source Code	No	The paper mentions 'The supplemental material provides full details on the experiments and more detailed ﬁgures of generated samples,' but it does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	The experiments were performed on the MNIST and Toronto Face Database (TFD) datasets, following the setup in Bengio et al. (2013b), where the model generates quantized (binary) pixels. Susskind, Joshua, Anderson, Adam, and Hinton, Geoffrey E. The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto, 2010.
Dataset Splits	No	The paper mentions a 'Test set' in the context of evaluation (Table 1), and discusses training on MNIST and TFD datasets, but it does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or a detailed splitting methodology).
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper describes the theoretical framework and experimental setup, but it does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions).
Experiment Setup	Yes	To emulate a sampling procedure similar to Boltzmann machines in which the ﬁlled-in missing values can depend on the representations at the top level, the computational graph allows information to propagate both upwards (from input to higher levels) and downwards, giving rise to the computational graph structure illustrated in Figure 2, which is similar to that explored for deterministic recurrent auto-encoders (Seung, 1998; Behnke, 2001; Savard, 2011). Downward weight matrices have been ﬁxed to the transpose of corresponding upward weight matrices. Here we consider the following stochastic non-linearities: hi = ηout +tanh(ηin +ai) where ai is the linear activation for unit i (an afﬁne transformation applied to the input of the unit, coming from the layer below, the layer above, or both) and ηin and ηout are zero-mean Gaussian noises. In the experiments, the graph was unfolded so that 2D sampled reconstructions would be produced, where D is the depth (number of hidden layers). The training loss is the sum of the reconstruction negative log-likelihoods (of target X) over all those reconstruction steps.