Mollifying Networks

Authors: Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we mainly focus on training of difficult to optimize models, in particular deep MLPs with sigmoid or tanh activation functions. The details of the experimental procedure is provided in Appendix C.
Researcher Affiliation Academia 1 University of Montreal, 2 University of Oxford, 3 Politecnico di Milano
Pseudocode Yes Algorithm 1 Activation of a unit i at layer l.
Open Source Code No We plan to release the source code of the models and experiments under, http://github.com/ caglar/molly_nets/.
Open Datasets Yes We train a thin deep neural network on MNIST (Le Cun & Cortes, 1998) dataset with 72 hidden layers and 100 hidden units.
Dataset Splits No The paper refers to training and validation losses and early stopping on validation accuracy, but does not explicitly provide split percentages or sample counts for the datasets used.
Hardware Specification No No specific hardware details such as GPU/CPU models or types of computing resources used for experiments were mentioned in the paper.
Software Dependencies No The paper mentions using 'Theano' and refers to 'Theano Development Team (2016)', but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes The weights of the models are initialized with Glorot & Bengio initialization Glorot et al. (2011). We use the learning rate of 4e 4 along with RMSProp. We initialize ai parameters of mollified activation function by sampling it from a uniform distribution, U[ 2, 2]. We used 100 hidden units at each layer with a minibatches of size 500.