Variance-Reduced and Projection-Free Stochastic Optimization

Authors: Elad Hazan, Haipeng Luo

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theoretical improvement is also observed in experiments on real-world datasets for a multiclass classification application. To support our theoretical results, we also conducted experiments on three large real-word datasets for a multiclass classification application.
Researcher Affiliation Academia Elad Hazan EHAZAN@CS.PRINCETON.EDU Princeton University, Princeton, NJ 08540, USA Haipeng Luo HAIPENGL@CS.PRINCETON.EDU Princeton University, Princeton, NJ 08540, USA
Pseudocode Yes Algorithm 1 Stochastic Variance-Reduced Frank-Wolfe (SVRF) Algorithm 2 STOchastic variance-Reduced Conditional gradient sliding (STORC)
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Three datasets are selected from the LIBSVM repository4 with relatively large number of features, categories and examples, summarized in the Table 3. 4https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For most of the parameters in these algorithms, we roughly follow what the theory suggests. For example, the size of mini-batch of stochastic gradients at round k is set to k2, k3 and k respectively for SFW, SCGS and SVRF, and is fixed to 100 for the other three. The number of iterations between taking two snapshots for variance-reduced methods (SVRG, SVRF and STORC) are fixed to 50. The learning rate is set to the typical decaying sequence c/k for SGD and a constant c for SVRG as the original work suggests for some best tuned c and c .