Universal Gradient Methods for Stochastic Convex Optimization
Authors: Anton Rodomanov, Ali Kavis, Yongtao Wu, Kimon Antonakopoulos, Volkan Cevher
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present some preliminary computational experiments for the proposed methods. ... Least-Squares. ... Logistic regression. ... Non-convex neural networks training. ... We present the result in Figure 2, where we can see the proposed stochastic universal gradient method can solve non-convex problems as well. |
| Researcher Affiliation | Academia | 1CISPA Helmholtz Center for Information Security, Saarbrücken, Germany 2Institute for Foundations of Machine Learning (IFML), UT Austin, Texas, USA 3Laboratory for Information and Inference Systems (LIONS), EPFL, Lausanne, Switzerland. |
| Pseudocode | Yes | Algorithm 1 Universal Line-Search-Free Gradient Method ... Algorithm 2 Universal Stochastic Gradient Method ... Algorithm 3 Universal Stochastic Fast Gradient Method |
| Open Source Code | No | The paper does not contain any statement about making its code open-source or provide links to a code repository for the described methodology. |
| Open Datasets | Yes | We run the experiment on real-world diabetes dataset from LIBSVM4. ... We run the experiment on the a1a and ionosphere datasets from LIBSVM. ... Specifically, we focus on classification tasks with the cross-entropy loss on MNIST dataset. ... To be specific, we train a Res Net18 (He et al., 2016) on CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | No | The paper mentions tuning parameters 'by a parameter sweep' and 'by sweeping over' certain values, which implies a validation process. However, it does not explicitly state the dataset splits (e.g., percentages or sample counts for training, validation, and test sets) used for this process. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments. |
| Experiment Setup | Yes | A three-layer fully connected networks with layer dimensions [28 28, 256, 256, 10] and Re LU activation function are selected. We select the minibatch size as 256. Step-size of each method is tuned by a parameter sweep over {10, 1, 0.1, 0.01, 0.001, 0.0001}. The diameter of the proposed method is tuned by sweeping over {50, 35, 20, 10, 5}. |