Variance-Reduced and Projection-Free Stochastic Optimization
Authors: Elad Hazan, Haipeng Luo
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The theoretical improvement is also observed in experiments on real-world datasets for a multiclass classification application. To support our theoretical results, we also conducted experiments on three large real-word datasets for a multiclass classification application. |
| Researcher Affiliation | Academia | Elad Hazan EHAZAN@CS.PRINCETON.EDU Princeton University, Princeton, NJ 08540, USA Haipeng Luo HAIPENGL@CS.PRINCETON.EDU Princeton University, Princeton, NJ 08540, USA |
| Pseudocode | Yes | Algorithm 1 Stochastic Variance-Reduced Frank-Wolfe (SVRF) Algorithm 2 STOchastic variance-Reduced Conditional gradient sliding (STORC) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Three datasets are selected from the LIBSVM repository4 with relatively large number of features, categories and examples, summarized in the Table 3. 4https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For most of the parameters in these algorithms, we roughly follow what the theory suggests. For example, the size of mini-batch of stochastic gradients at round k is set to k2, k3 and k respectively for SFW, SCGS and SVRF, and is fixed to 100 for the other three. The number of iterations between taking two snapshots for variance-reduced methods (SVRG, SVRF and STORC) are fixed to 50. The learning rate is set to the typical decaying sequence c/k for SGD and a constant c for SVRG as the original work suggests for some best tuned c and c . |