A Simple Practical Accelerated Method for Finite Sums

Authors: Aaron Defazio

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested our algorithm which we call Point-SAGA against SAGA [Defazio et al., 2014a], SDCA [Shalev-Shwartz and Zhang, 2013a], Pegasos/SGD [Shalev-Shwartz et al., 2011] and the catalyst acceleration scheme [Lin et al., 2015]. We selected a set of commonly used datasets from the LIBSVM repository [Chang and Lin, 2011]. Figure 1 shows the function value sub-optimality for each dataset-subset combination.
Researcher Affiliation Industry Aaron Defazio Ambiata, Sydney Australia
Pseudocode Yes Algorithm 1 Pick some starting point x0 and step size γ. Initialize each g0 i = f i(x0), where f i(x0) is any gradient/subgradient at x0. Then at step k + 1: 1. Pick index j from 1 to n uniformly at random. 2. Update x: zk j = xk + γ xk+1 = proxγ j zk j .
Open Source Code Yes Open source code to exactly replicate the experimental results is available at https://github.com/adefazio/point-saga.
Open Datasets Yes We selected a set of commonly used datasets from the LIBSVM repository [Chang and Lin, 2011].
Dataset Splits No The paper mentions using LIBSVM datasets and subsampling, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) nor does it refer to predefined standard splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions 'Cython implementation' but does not specify version numbers for Cython or any other key software libraries or dependencies (e.g., Python, NumPy, PyTorch versions) needed for replication.
Experiment Setup Yes The step-size parameter for each method (κ for catalyst-SDCA) was chosen using a grid search of powers of 2. The step size that gives the lowest error at the final epoch is used for each method. The L2 regularization constant for each problem was set by hand to ensure f was not in the big data regime n L/µ; as noted above, all the methods perform essentially the same when n L/µ. The constant used is noted beneath each plot.