Adaptive Gradient Descent without Descent

Authors: Yura Malitsky, Konstantin Mishchenko

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.
Researcher Affiliation Academia 1EPFL, Lausanne, Switzerland 2KAUST, Thuwal, Saudi Arabia.
Pseudocode Yes Algorithm 1 Adaptive gradient descent
Open Source Code Yes 3See https://github.com/ymalitsky/adaptive_gd
Open Datasets Yes We use mushrooms and covtype datasets to run the experiments. For the experiments we used Movilens 100K dataset (Harper & Konstan, 2016) train them to classify images from the Cifar10 dataset (Krizhevsky et al., 2009)
Dataset Splits No The paper uses standard datasets like Cifar10 but does not explicitly provide specific train/validation/test dataset split percentages, sample counts, or explicit methodology for creating these splits in the main text.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory, cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' as an implementation framework but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We use batch size 128 for all methods. For our method, we observed that 1 Lk works better than 1 2Lk . We ran it with 1 + γθk in the other factor with values of γ from {1, 0.1, 0.05, 0.02, 0.01} and γ = 0.02 performed the best.