Non-convex Stochastic Composite Optimization with Polyak Momentum
Authors: Yuan Gao, Anton Rodomanov, Sebastian U Stich
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide numerical experiments to validate our theoretical results. |
| Researcher Affiliation | Academia | 1CISPA, Saarbrücken, Germany 2Universität des Saarlandes. |
| Pseudocode | Yes | Algorithm 1 Proximal Gradient Method with Polyak Momentum |
| Open Source Code | No | The paper does not provide any concrete statement or link regarding the open-sourcing of the code for the methodology described in this paper. |
| Open Datasets | Yes | We evaluate the performances of Algorithm 1 and the vanilla stochastic proximal gradient method on the Cifar-10 dataset (Krizhevsky et al., 2014) with the Resnet-18 (He et al., 2016). |
| Dataset Splits | No | The paper mentions using the Cifar-10 dataset and refers to training loss and test accuracy, but it does not explicitly provide details about training/validation/test splits or how data was partitioned for validation purposes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Resnet-18 and SGD, but it does not specify version numbers for any software dependencies or libraries needed to replicate the experiments. |
| Experiment Setup | Yes | The parameter M is tuned by a grid search in {100, 101, 102, 103, 104} for all methods, and the momentum parameter γ is tuned by a grid search in {10 1, 10 2, 10 3, 10 4, 10 5}. We set the maximum number of iterations to be 104, and the tolerance is 0.02. We use a batch size of 256 and run 300 epochs. We use the standard step size parameter M = 10 (corresponding to a learning rate of 0.1) for the experiment. |