Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Gradient Descent without Descent
Authors: Yura Malitsky, Konstantin Mishchenko
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization. |
| Researcher Affiliation | Academia | 1EPFL, Lausanne, Switzerland 2KAUST, Thuwal, Saudi Arabia. |
| Pseudocode | Yes | Algorithm 1 Adaptive gradient descent |
| Open Source Code | Yes | 3See https://github.com/ymalitsky/adaptive_gd |
| Open Datasets | Yes | We use mushrooms and covtype datasets to run the experiments. For the experiments we used Movilens 100K dataset (Harper & Konstan, 2016) train them to classify images from the Cifar10 dataset (Krizhevsky et al., 2009) |
| Dataset Splits | No | The paper uses standard datasets like Cifar10 but does not explicitly provide specific train/validation/test dataset split percentages, sample counts, or explicit methodology for creating these splits in the main text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory, cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017)' as an implementation framework but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use batch size 128 for all methods. For our method, we observed that 1 Lk works better than 1 2Lk . We ran it with 1 + γθk in the other factor with values of γ from {1, 0.1, 0.05, 0.02, 0.01} and γ = 0.02 performed the best. |