Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Authors: Ahmed Khaled, Konstantin Mishchenko, Chi Jin
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To complement our theory, we also show empirically that Do WG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks. |
| Researcher Affiliation | Collaboration | Ahmed Khaled Princeton University Konstantin Mishchenko Samsung AI Center Chi Jin Princeton University |
| Pseudocode | Yes | Algorithm 1: Do WG: Distance over Weighted Gradients |
| Open Source Code | Yes | implement1 Do WG on top of the Do G code2. 1https://github.com/rka97/dowg 2https://github.com/formll/dog |
| Open Datasets | Yes | We train the VGG11 (Simonyan and Zisserman, 2015) and Res Net-50 (He et al., 2016) neural network architectures on CIFAR10 (Krizhevsky, 2009) using Py Torch (Paszke et al., 2019) |
| Dataset Splits | No | The paper uses CIFAR10 and mentions 'Test accuracy' and 'Train accuracy/loss', but does not explicitly describe the dataset splits (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | Yes | on a single RTX3090 GPU |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify the version number of PyTorch or any other software. |
| Experiment Setup | Yes | All methods are used with batch size 256 with no weight decay on a single RTX3090 GPU. We also add comparison against Adam (Kingma and Ba, 2015) with cosine annealing and the standard step size 10^-3. |