A Simple Practical Accelerated Method for Finite Sums
Authors: Aaron Defazio
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested our algorithm which we call Point-SAGA against SAGA [Defazio et al., 2014a], SDCA [Shalev-Shwartz and Zhang, 2013a], Pegasos/SGD [Shalev-Shwartz et al., 2011] and the catalyst acceleration scheme [Lin et al., 2015]. We selected a set of commonly used datasets from the LIBSVM repository [Chang and Lin, 2011]. Figure 1 shows the function value sub-optimality for each dataset-subset combination. |
| Researcher Affiliation | Industry | Aaron Defazio Ambiata, Sydney Australia |
| Pseudocode | Yes | Algorithm 1 Pick some starting point x0 and step size γ. Initialize each g0 i = f i(x0), where f i(x0) is any gradient/subgradient at x0. Then at step k + 1: 1. Pick index j from 1 to n uniformly at random. 2. Update x: zk j = xk + γ xk+1 = proxγ j zk j . |
| Open Source Code | Yes | Open source code to exactly replicate the experimental results is available at https://github.com/adefazio/point-saga. |
| Open Datasets | Yes | We selected a set of commonly used datasets from the LIBSVM repository [Chang and Lin, 2011]. |
| Dataset Splits | No | The paper mentions using LIBSVM datasets and subsampling, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) nor does it refer to predefined standard splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Cython implementation' but does not specify version numbers for Cython or any other key software libraries or dependencies (e.g., Python, NumPy, PyTorch versions) needed for replication. |
| Experiment Setup | Yes | The step-size parameter for each method (κ for catalyst-SDCA) was chosen using a grid search of powers of 2. The step size that gives the lowest error at the final epoch is used for each method. The L2 regularization constant for each problem was set by hand to ensure f was not in the big data regime n L/µ; as noted above, all the methods perform essentially the same when n L/µ. The constant used is noted beneath each plot. |