DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Authors: Ahmed Khaled, Konstantin Mishchenko, Chi Jin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To complement our theory, we also show empirically that Do WG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks. |
| Researcher Affiliation | Collaboration | Ahmed Khaled Princeton University Konstantin Mishchenko Samsung AI Center Chi Jin Princeton University |
| Pseudocode | Yes | Algorithm 1: Do WG: Distance over Weighted Gradients |
| Open Source Code | Yes | implement1 Do WG on top of the Do G code2. 1https://github.com/rka97/dowg 2https://github.com/formll/dog |
| Open Datasets | Yes | We train the VGG11 (Simonyan and Zisserman, 2015) and Res Net-50 (He et al., 2016) neural network architectures on CIFAR10 (Krizhevsky, 2009) using Py Torch (Paszke et al., 2019) |
| Dataset Splits | No | The paper uses CIFAR10 and mentions 'Test accuracy' and 'Train accuracy/loss', but does not explicitly describe the dataset splits (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | Yes | on a single RTX3090 GPU |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify the version number of PyTorch or any other software. |
| Experiment Setup | Yes | All methods are used with batch size 256 with no weight decay on a single RTX3090 GPU. We also add comparison against Adam (Kingma and Ba, 2015) with cosine annealing and the standard step size 10^-3. |