L4: Practical loss-based stepsize adaptation for deep learning
Authors: Michal Rolinek, Georg Martius
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers... The performance is validated on multiple architectures including dense nets, CNNs, Res Nets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others. |
| Researcher Affiliation | Academia | Michal Rolínek and Georg Martius Max-Planck-Institute for Intelligent Systems Tübingen, Germany michal.rolinek@tuebingen.mpg.de and georg.martius@tuebingen.mpg.de |
| Pseudocode | Yes | see Algorithm 1 in the Supplementary for the pseudocode. |
| Open Source Code | No | The paper mentions 'on releasing a prototype implementation that is easy to use in practice' but does not provide a specific link or explicit statement within the main text that their code is publicly available. |
| Open Datasets | Yes | The performance is validated on multiple architectures including dense nets, CNNs, Res Nets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others. |
| Dataset Splits | No | The paper uses well-known datasets like MNIST, CIFAR-10, and Fashion MNIST, which typically have predefined splits. However, it does not explicitly state the training, validation, and test split percentages, sample counts, or provide citations specifically for the splits within the paper. For DNC, it explicitly states 'there is no separate test regime'. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models, or cloud computing instance types used for running the experiments. It only mentions that there was 'neither any runtime increase nor additional memory requirements' from their adaptation. |
| Software Dependencies | No | The paper mentions using TensorFlow-related code for baselines (e.g., 'Tensorflow implementation of Res Nets, 2016. Commit 1f34fcaf' and 'Tensor Flow documentation') but does not specify the version numbers for the general software dependencies (like Python, TensorFlow version, or other libraries) used in their own experiments. |
| Experiment Setup | Yes | All other parameters are as follows: for momentum SGD we used a timescale of 10 steps (β = 0.9); for Adam: β1 = 0.9, β2 = 0.999, and ε = 10 4. We set γ = 0.9, τ = 1000, and γ0 = 0.75 as default settings and we use these values in all our experiments. Batch size in use is 64. |