Mollifying Networks
Authors: Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we mainly focus on training of difficult to optimize models, in particular deep MLPs with sigmoid or tanh activation functions. The details of the experimental procedure is provided in Appendix C. |
| Researcher Affiliation | Academia | 1 University of Montreal, 2 University of Oxford, 3 Politecnico di Milano |
| Pseudocode | Yes | Algorithm 1 Activation of a unit i at layer l. |
| Open Source Code | No | We plan to release the source code of the models and experiments under, http://github.com/ caglar/molly_nets/. |
| Open Datasets | Yes | We train a thin deep neural network on MNIST (Le Cun & Cortes, 1998) dataset with 72 hidden layers and 100 hidden units. |
| Dataset Splits | No | The paper refers to training and validation losses and early stopping on validation accuracy, but does not explicitly provide split percentages or sample counts for the datasets used. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models or types of computing resources used for experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using 'Theano' and refers to 'Theano Development Team (2016)', but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | The weights of the models are initialized with Glorot & Bengio initialization Glorot et al. (2011). We use the learning rate of 4e 4 along with RMSProp. We initialize ai parameters of mollified activation function by sampling it from a uniform distribution, U[ 2, 2]. We used 100 hidden units at each layer with a minibatches of size 500. |