Variance Reduced Stochastic Gradient Descent with Neighbors
Authors: Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results on the performance of the different variants of memorization algorithms for variance reduced SGD as discussed in this paper. SAGA has been uniformly superior to SVRG in our experiments, so we compare SAGA and ϵN-SAGA (from Eq. (26)), alongside with SGD as a straw man and q-SAGA as a point of reference for speed-ups. We apply least-square regression on the million song year regression from the UCI repository. This dataset contains n = 515, 345 data points, each described by d = 90 input features. We apply logistic regression on the cov and ijcnn1 datasets obtained from the libsvm website. |
| Researcher Affiliation | Academia | Thomas Hofmann Department of Computer Science ETH Zurich, Switzerland Aurelien Lucchi Department of Computer Science ETH Zurich, Switzerland Simon Lacoste-Julien INRIA Sierra Project-Team Ecole Normale Sup erieure, Paris, France Brian Mc Williams Department of Computer Science ETH Zurich, Switzerland |
| Pseudocode | No | The paper describes algorithms but does not include structured pseudocode blocks or formally labeled algorithm figures. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for its methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We apply least-square regression on the million song year regression from the UCI repository. This dataset contains n = 515, 345 data points, each described by d = 90 input features. We apply logistic regression on the cov and ijcnn1 datasets obtained from the libsvm website 2. The cov dataset contains n = 581, 012 data points, each described by d = 54 input features. The ijcnn1 dataset contains n = 49, 990 data points, each described by d = 22 input features. |
| Dataset Splits | No | The paper mentions running algorithms in an i.i.d. sampling setting and averaging results over 5 runs, but it does not provide specific details on train, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using datasets obtained from 'libsvm website' but does not specify any software libraries or frameworks used for implementation, nor their version numbers. |
| Experiment Setup | Yes | We have chosen q = 20 for q-SAGA and ϵN-SAGA. The same setting was used across all data sets and experiments. A step size γ = q µn was used everywhere, except for plain SGD . Note that as K 1 in all cases, this is close to the optimal value suggested by our analysis; moreover, using a step size of 1 L for SAGA as suggested in previous work [9] did not appear to give better results. For plain SGD, we used a schedule of the form γt = γ0/t with constants optimized coarsely via cross-validation. |