SGDR: Stochastic Gradient Descent with Warm Restarts
Authors: Ilya Loshchilov, Frank Hutter
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the Image Net dataset. |
| Researcher Affiliation | Academia | Ilya Loshchilov & Frank Hutter University of Freiburg Freiburg, Germany, {ilya,fh}@cs.uni-freiburg.de |
| Pseudocode | No | The paper provides equation (5) for the learning rate schedule, but it does not present a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our source code is available at https://github.com/loshchil/SGDR |
| Open Datasets | Yes | The CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) consist of 32 32 color images drawn from 10 and 100 classes, respectively, split into 50,000 train and 10,000 test images. |
| Dataset Splits | No | The paper explicitly mentions "50,000 train and 10,000 test images" for CIFAR-10/100, but does not provide specific details (percentages, sample counts) for a separate validation split. It discusses "validation error" in the discussion, but without defining a split for it. |
| Hardware Specification | No | The paper does not specify any hardware details like CPU, GPU models, or memory used for the experiments. It mentions "high-performance GPUs" generally but no specifics. |
| Software Dependencies | No | The paper mentions using "SGD with Nesterov s momentum" and WRNs, but it does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers. |
| Experiment Setup | Yes | For training, Zagoruyko & Komodakis (2016) used SGD with Nesterov s momentum with initial learning rate set to η0 = 0.1, weight decay to 0.0005, dampening to 0, momentum to 0.9 and minibatch size to 128. The learning rate is dropped by a factor of 0.2 at 60, 120 and 160 epochs, with a total budget of 200 epochs. |