reproducibilityindex.ai

Training Neural Networks for and by Interpolation

Authors: Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically compare ALI-G to the optimization algorithms most commonly used in deep learning. Our experiments span a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training wide residual networks on SVHN; (iii) training a Bi-LSTM on the Stanford Natural Language Inference data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets.
Researcher Affiliation	Collaboration	Leonard Berrada 1 Andrew Zisserman 2 M. Pawan Kumar 2 1Deep Mind, London, United Kingdom. Work performed while at University of Oxford. 2Department of Engineering Science, University of Oxford, Oxford, United Kingdom.
Pseudocode	Yes	Algorithm 1 The ALI-G algorithm
Open Source Code	Yes	The code to reproduce our results is publicly available1. 1https://github.com/oval-group/ali-g
Open Datasets	Yes	training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. [...] We demonstrate the scalability of ALI-G by training a Res Net-18 (He et al., 2016) on the Image Net data set.
Dataset Splits	Yes	From the 73k difﬁcult training examples, we select 6k samples for validation; we use all remaining (both difﬁcult and easy) examples for training, for a total of 598k samples. [...] We use 45k samples for training and 5k for validation.
Hardware Specification	No	All experiments are performed either on a 12-core CPU (differentiable neural computer), on a single GPU (SVHN, SNLI, CIFAR) or on up to 4 GPUs (Image Net). No specific models (e.g., Intel Core i7, NVIDIA V100) or detailed specifications are provided.
Software Dependencies	No	In the Tensor Flow (Abadi et al., 2015) experiment, we use the ofﬁcial and publicly available implementation of L42. In the Py Torch (Paszke et al., 2017) experiments, we use our implementation of L4, which we unit-test against the ofﬁcial Tensor Flow implementation. No specific version numbers for TensorFlow, PyTorch, or any other software dependencies are provided.
Experiment Setup	Yes	We vary the initial learning rate as powers of ten between 10 4 and 104 for each method except for L4Adam and L4Mom. [...] The gradient norm is clipped for all methods except for ALI-G, L4Adam and L4Mom. [...] The ℓ2 regularization is crossvalidated in {0.0001, 0.0005} for all methods but ALI-G. For ALI-G, the regularization is expressed as a constraint on the ℓ2-norm of the parameters, and its maximal value is set to 50. SGD, ALI-G and BPGrad use a Nesterov momentum of 0.9. All methods use a dropout rate of 0.4 and a ﬁxed budget of 160 epochs.