Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Gradient Descent: The Ultimate Optimizer
Authors: Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, ERIK MEIJER
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments validating this for MLPs, CNNs, and RNNs. |
| Researcher Affiliation | Collaboration | Kartik Chandra MIT CSAIL Cambridge, MA EMAIL Audrey Xie MIT CSAIL Cambridge, MA EMAIL Jonathan Ragan-Kelley MIT CSAIL Cambridge, MA EMAIL Erik Meijer Meta, Inc. Menlo Park, CA EMAIL Equal contribution. Work done in part at Meta, Inc. and in part at Stanford University. |
| Pseudocode | Yes | Below is pseudocode for an SGD optimizer that uses .detach() as we have discussed. The highlighted calls to .detach() correspond to detaching the weights and their gradients. def SGD.__init__(self, alpha): self.alpha = alpha def SGD.step(w): d_w = w.grad.detach() w = w.detach() self.alpha.detach() * d_w |
| Open Source Code | Yes | Finally, we provide a simple Py Torch implementation of this algorithm (see people.csail.mit.edu/kach/gradient-descent-the-ultimate-optimizer). |
| Open Datasets | Yes | We conducted initial experiments on the MNIST dataset (Lecun et al., 1998)... We train a Res Net-20 (He et al., 2016) with and without hyperoptimization on the CIFAR-10 dataset (Krizhevsky, 2012)... We train a character-level RNN ('Char-RNN') on the Tolstoy dataset, as proposed by Karpathy et al. (2015)... |
| Dataset Splits | No | The paper mentions using well-known datasets like MNIST, CIFAR-10, and Tolstoy, but it does not explicitly provide details about specific training, validation, and test splits (e.g., percentages, sample counts, or citations to standard split methodologies) within the text. |
| Hardware Specification | Yes | Each of these experiments was conducted on a single NVIDIA TITAN Xp GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' and references a specific commit hash for an SGD optimizer file ('optim/sgd.py, commit ff94c9d'), but it does not provide explicit version numbers for PyTorch or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | We conducted initial experiments on the MNIST dataset... using a neural network with one fully-connected hidden layer of size 128, tanh activations, and a batch size of 256. We trained all networks for 30 epochs... For Res Net-20 on CIFAR-10: optimizer (SGD), step size (0.1), momentum (0.9), and weight decay (10 4)... Experiments were run for 200 epochs... For Char-RNN on Tolstoy: 2-layer LSTM with 128 hidden nodes... Adam optimizer with = 2 10 3, run for 50,000 gradient descent steps. |