Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Non-convex Stochastic Composite Optimization with Polyak Momentum

Authors: Yuan Gao, Anton Rodomanov, Sebastian U Stich

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide numerical experiments to validate our theoretical results.
Researcher Affiliation Academia 1CISPA, Saarbrücken, Germany 2Universität des Saarlandes.
Pseudocode Yes Algorithm 1 Proximal Gradient Method with Polyak Momentum
Open Source Code No The paper does not provide any concrete statement or link regarding the open-sourcing of the code for the methodology described in this paper.
Open Datasets Yes We evaluate the performances of Algorithm 1 and the vanilla stochastic proximal gradient method on the Cifar-10 dataset (Krizhevsky et al., 2014) with the Resnet-18 (He et al., 2016).
Dataset Splits No The paper mentions using the Cifar-10 dataset and refers to training loss and test accuracy, but it does not explicitly provide details about training/validation/test splits or how data was partitioned for validation purposes.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like Resnet-18 and SGD, but it does not specify version numbers for any software dependencies or libraries needed to replicate the experiments.
Experiment Setup Yes The parameter M is tuned by a grid search in {100, 101, 102, 103, 104} for all methods, and the momentum parameter γ is tuned by a grid search in {10 1, 10 2, 10 3, 10 4, 10 5}. We set the maximum number of iterations to be 104, and the tolerance is 0.02. We use a batch size of 256 and run 300 epochs. We use the standard step size parameter M = 10 (corresponding to a learning rate of 0.1) for the experiment.