General Stochastic Networks for Classification

Authors: Matthias Zöhrer, Franz Pernkopf

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3 we empirically study the influence of hyper parameters of GSNs and present experimental results.
Researcher Affiliation Academia Matthias Z ohrer and Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology matthias.zoehrer@tugraz.at, pernkopf@tugraz.at
Pseudocode No The paper includes diagrams of Markov chains and mathematical equations but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The code will be made publicly available for reproducing the results.
Open Datasets Yes In order to evaluate the capabilities of GSNs for supervised learning, we studied MNIST digits [13], variants of MNIST digits [14] and the rectangle datasets [14].
Dataset Splits Yes Each variant includes 10.000 labeled training, 2000 labeled validation, and 50.000 labeled test images.
Hardware Specification No All simulations1 were executed on a GPU with the help of the mathematical expression compiler Theano [31]. The specific model or type of GPU is not mentioned.
Software Dependencies No All simulations1 were executed on a GPU with the help of the mathematical expression compiler Theano [31]. No specific version number for Theano is provided.
Experiment Setup Yes In all experiments a three layer GSN, i.e. GSN-3, with 2000 neurons in each layer, randomly initialized with small Gaussian noise, i.e. 0.01 N(0, 1), and an MSE loss function for both inputs and targets was used. Regarding optimization we applied SGD with a learning rate η = 0.1, a momentum term of 0.9 and a multiplicative annealing factor ηn+1 = ηn 0.99 per epoch n for the learning rate. A rectifier unit [23] was chosen as activation function. Walkback training with K = 6 steps using zero-mean preand postactivation Gaussian noise with zero mean and variance σ = 0.1 was performed for 500 training epochs.