SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques

Authors: Elad Richardson, Rom Herskovitz, Boris Ginsburg, Michael Zibulevsky

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The method was evaluated on several deep learning tasks, demonstrating significant improvement in performance.
Researcher Affiliation Collaboration 1Technion, Israel Institute of Technology 2Nvidia INC
Pseudocode Yes Algorithm 1 The SEBOOST algorithm, Algorithm 2 Controlling anchors in SEBOOST
Open Source Code Yes Our algorithm was implemented and evaluated using the Torch7 framework [1], and is publicly available 1. https://github.com/eladrich/seboost
Open Datasets Yes The MNIST dataset was used, with 60,000 training images of size 28 × 28 and 10,000 test images. For classification purposes a standard benchmark is the CIFAR-10 dataset. The dataset was divided into 18,000 training examples and 2,000 test examples.
Dataset Splits No The paper specifies training and test splits for the datasets but does not explicitly mention a separate validation set or describe a validation split method (e.g., cross-validation).
Hardware Specification No The paper does not specify any particular hardware components such as CPU or GPU models used for running the experiments. It only mentions "actual processor time" generally.
Software Dependencies No The paper mentions using "the Torch7 framework [1]" but does not specify a version number for Torch7 or any other software libraries or dependencies.
Experiment Setup Yes The main hyper-parameters that were altered during the experiments were: lrmethod The learning rate of a baseline method. M Maximal number of old directions. ℓNumber of baseline steps between each subspace optimization. For all experiments the weight decay was set at 0.0001 and the momentum was fixed at 0.9 for SGD and NAG. Unless stated otherwise, the number of function evaluations for CG was set at 20. The baseline method used a mini-batch of size 100, while the subspace optimization was applied with a mini-batch of size 1000.