reproducibilityindex.ai

Training Deep Networks without Learning Rates Through Coin Betting

Authors: Francesco Orabona, Tatiana Tommasi

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms.We run experiments on various datasets and architectures, comparing COCOB with some popular stochastic gradient learning algorithms
Researcher Affiliation	Academia	Francesco Orabona Department of Computer Science Stony Brook University Stony Brook, NY francesco@orabona.com Tatiana Tommasi Department of Computer, Control, and Management Engineering Sapienza, Rome University, Italy tommasi@dis.uniroma1.it
Pseudocode	Yes	Algorithm 1 COntinuous COin Betting COCOB Algorithm 2 COCOB-Backprop
Open Source Code	Yes	We implemented4 COCOB (following Algorithm 2) in Tensorﬂow [Abadi et al., 2015] and we used the implementations of the other algorithms provided by this deep learning framework. The footnote 4 provides a link: https://github.com/bremen79/cocob
Open Datasets	Yes	Digits Recognition. As a ﬁrst test, we tackle handwritten digits recognition using the MNIST dataset [Le Cun et al., 1998a]. Object Classiﬁcation. We use the popular CIFAR-10 dataset [Krizhevsky, 2009] to classify 32 32 RGB images across 10 object categories. Word-level Prediction with RNN. Here we train a Recurrent Neural Network (RNN) on a language modeling task. Speciﬁcally, we conduct word-level prediction experiments on the Penn Tree Bank (PTB) dataset [Marcus et al., 1993]
Dataset Splits	Yes	It contains 28 28 grayscale images with 60k training data, and 10k test samples. The dataset has 60k images in total, split into a training/test set of 50k/10k samples. We adopted the medium LSTM [Hochreiter and Schmidhuber, 1997] network architecture described in Zaremba et al. [2014]: it has 2 layers with 650 units per layer and parameters initialized uniformly in [ 0.05, 0.05], a dropout of 50% is applied on the non-recurrent connections, and the norm of the gradients (normalized by mini-batch size = 20) is clipped at 5. (For PTB, it mentions 929k training words and 73k validation words, implying a split for validation)
Hardware Specification	Yes	The authors thank the Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance Sea Wulf computing system, which was made possible by a $1.4M National Science Foundation grant (#1531492).
Software Dependencies	No	The paper mentions: 'We implemented4 COCOB (following Algorithm 2) in Tensorﬂow [Abadi et al., 2015]'. It only mentions TensorFlow without a specific version number. It also lists other algorithms (Ada Grad, RMSProp, Adadelta, Adam) but does not specify their software versions.
Experiment Setup	Yes	For the ﬁrst network we reproduce the structure described in the multi-layer experiment of [Kingma and Ba, 2015]: it has two fully connected hidden layers with 1000 hidden units each and Re LU activations, with mini-batch size of 100. The weights are initialized with a centered truncated normal distribution and standard deviation 0.1, the same small value 0.1 is also used as initialization for the bias. For CIFAR-10: We use a batch size of 128 and the input images are simply pre-processed by whitening. For PTB: it has 2 layers with 650 units per layer and parameters initialized uniformly in [ 0.05, 0.05], a dropout of 50% is applied on the non-recurrent connections, and the norm of the gradients (normalized by mini-batch size = 20) is clipped at 5.