Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization

Authors: Michael Metel, Akiko Takeda

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also compare our algorithms performance in practice for empirical risk minimization. No numerical experiments were conducted in (Xu et al., 2018). We implemented all algorithms for an application in empirical risk minimization and found the simplest algorithm to implement also performed the best in practice. We conducted experiments comparing our algorithms to those of (Xu et al., 2018) for the problem of binary classification as described in Section 6, on datasets a9a (Fan, 2018) and MNIST (Le Cun, 1998)
Researcher Affiliation Academia 1RIKEN Center for Advanced Intelligence Project, Tokyo, Japan 2Department of Creative Informatics, Graduate School of Information Science and Technology, the University of Tokyo, Tokyo, Japan.
Pseudocode Yes Algorithm 1 Mini-batch stochastic gradient algorithm (MBSGA) Input: w1 Rd, N Z>0, α, θ R... Algorithm 2 Variance reduced stochastic gradient algorithm (VRSGA) Input: w1 Rd, N Z>0, α, θ R...
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We conducted experiments comparing our algorithms to those of (Xu et al., 2018) for the problem of binary classification as described in Section 6, on datasets a9a (Fan, 2018) and MNIST (Le Cun, 1998)
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning into training, validation, and test sets.
Hardware Specification Yes All experiments were conducted using MATLAB 2017b on a Mac Pro with a 2.7 GHz 12-core Intel Xeon E5 processor and 64GB of RAM.
Software Dependencies No The paper mentions 'MATLAB 2017b' but does not provide specific version numbers for any other key ancillary software components or libraries required for reproducibility.
Experiment Setup Yes The algorithms were initially run taking e = 15 effective passes over the data for a9a and e = 9 for MNIST. These values were adjusted so that all algorithms ended at approximately the same time. The regularizer s parameters were chosen as κ = 1/d and ν = 1. This parameter was estimated by doing 50 iterations of MBSGA with step size γ = 1/LEλ , using a different random seed than was used for the experiments, and computing the sample estimate ˆσk each iteration with the M samples used in the algorithm.