DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

Authors: Ahmed Khaled, Konstantin Mishchenko, Chi Jin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To complement our theory, we also show empirically that Do WG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.
Researcher Affiliation Collaboration Ahmed Khaled Princeton University Konstantin Mishchenko Samsung AI Center Chi Jin Princeton University
Pseudocode Yes Algorithm 1: Do WG: Distance over Weighted Gradients
Open Source Code Yes implement1 Do WG on top of the Do G code2. 1https://github.com/rka97/dowg 2https://github.com/formll/dog
Open Datasets Yes We train the VGG11 (Simonyan and Zisserman, 2015) and Res Net-50 (He et al., 2016) neural network architectures on CIFAR10 (Krizhevsky, 2009) using Py Torch (Paszke et al., 2019)
Dataset Splits No The paper uses CIFAR10 and mentions 'Test accuracy' and 'Train accuracy/loss', but does not explicitly describe the dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification Yes on a single RTX3090 GPU
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify the version number of PyTorch or any other software.
Experiment Setup Yes All methods are used with batch size 256 with no weight decay on a single RTX3090 GPU. We also add comparison against Adam (Kingma and Ba, 2015) with cosine annealing and the standard step size 10^-3.