reproducibilityindex.ai

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Authors: Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study the effect of the choice of activation function (we often just say activation) on the training of overparametrized neural networks. By overparametrized setting we roughly mean that the number of parameters or weights in the networks is much larger than the number of data samples. ...7 EXPERIMENTS Synthetic data. We consider n equally spaced data points on S1, randomly lifted to S9. We randomly label the data-points from U { 1, 1}. We train a 2-layer neural network in the DZPS setting with mean squared loss, containing 106 neurons in the ﬁrst layer with activations tanh, Re LU, swish and ELU at learning rate 10 3. The output layer is not trained during gradient descent. In Figure 1(a) and Figure 1(b) we plot the squared loss against the number of epochs trained. Results are averaged over 5 different runs. ... Real data. We consider a random subset of 104 images from CIFAR10 dataset (Krizhevsky & Hinton, 2009). We train a 2-layer network containing 105 neurons in the ﬁrst layer.
Researcher Affiliation	Collaboration	Abhishek Panigrahi Microsoft Research India t-abpani@microsoft.com Abhishek Shetty Cornell University shetty@cs.cornell.edu Navin Goyal Microsoft Research India navingo@microsoft.com
Pseudocode	No	The paper contains mathematical derivations, proofs, and theoretical analyses, but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Real data. We consider a random subset of 104 images from CIFAR10 dataset (Krizhevsky & Hinton, 2009).
Dataset Splits	No	The paper mentions using a 'random subset of 104 images from CIFAR10 dataset' for training a 2-layer network and verifying an assumption on data samples, but it does not specify any train/validation/test splits or cross-validation methodology. It does not provide sufficient detail to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU model, CPU type, memory specifications) used for running the experiments. It only generally refers to training neural networks.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). It does not mention any libraries or frameworks used in the experiments.
Experiment Setup	Yes	We train a 2-layer neural network in the DZPS setting with mean squared loss, containing 106 neurons in the ﬁrst layer with activations tanh, Re LU, swish and ELU at learning rate 10 3. ... We observed a difference in the rate of convergence while training a 2-layer network, with both layers trainable, using 256 batch sized stochastic gradient descent (SGD) with cross entropy loss on the random subset of CIFAR10 dataset at l.r. 10 3.