Techniques for Learning Binary Stochastic Feedforward Neural Networks
Authors: Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments confirm that training stochastic networks is difficult and show that the proposed two estimators perform favorably among all the five known estimators. We propose two experiments as benchmarks for stochastic feedforward networks based on the MNIST handwritten digit dataset (Le Cun et al., 1998) and the Toronto Face Database (Susskind et al., 2010). |
| Researcher Affiliation | Academia | Tapani Raiko & Mathias Berglund Department of Information and Computer Science Aalto University Espoo, Finland {tapani.raiko,mathias.berglund}@aalto.fi Guillaume Alain & Laurent Dinh Department of Computer Science and Operations Research Universit e de Montr eal Montr eal, Canada guillaume.alain.umontreal@gmail.com, dinhlaur@iro.umontreal.ca |
| Pseudocode | No | The paper describes estimators and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for its methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We propose two experiments as benchmarks for stochastic feedforward networks based on the MNIST handwritten digit dataset (Le Cun et al., 1998) and the Toronto Face Database (Susskind et al., 2010). |
| Dataset Splits | Yes | In the MNIST experiments we used a separate validation set to select the learning rate. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper acknowledges the use of Theano ('Theano (Bastien et al., 2012; Bergstra et al., 2010)') but does not specify its version number or any other software dependencies with their versions, which are required for reproducibility. |
| Experiment Setup | Yes | In all of the experiments, we used stochastic gradient descent with a mini-batch size of 100 and momentum of 0.9. We used a learning rate schedule where the learning rate increases linearly from zero to maximum during the first five epochs and back to zero during the remaining epochs. The maximum learning rate was chosen among {0.0001, 0.0003, 0.001, . . . , 1}. The models were trained with M {1, 20}, and during test time we always used M = 100. We used a network structure of 392-200-200-392 and 2304-200-200-2304 in the first and second problem, respectively. |