On the Computational Efficiency of Training Neural Networks

Authors: Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We describe some experimental evidence, showing that our practical algorithm is competitive with state-of-the-art neural network training methods for depth-2 networks.
Researcher Affiliation Academia Roi Livni The Hebrew University roi.livni@mail.huji.ac.il Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode Yes Figure 1: Approximate tensor maximization. Input: {xi}m i=1 Rd α Rm, τ,δ Output: A 1 τ d approximate solution to max w , u , v =1 F(w, u, v) = X i αi(w xi)(u xi)(v xi) Pick randomly w1, . . . , ws iid according to N(0, Id). For t = 1, . . . , 2d log 1 δ wt wt wt Let A = P i αi(w t xi)xix i and set ut, vt s.t: Tr(u t Avt) (1 τ) max u , v =1 Tr(u Av). Return w, u, v the maximizers of maxi s F(wi, ui, ui).
Open Source Code No The paper does not explicitly state that source code for its methodology is made available or provide a link to a repository for its own code.
Open Datasets No We collected 200k training examples of image patches of size 88x40 pixels containing either pedestrians (positive examples) or hard negative examples...
Dataset Splits Yes We used half of the examples as a training set and the other half as a test set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'the Matlab implementation provided in http://www.mathworks.com/matlabcentral/ fileexchange/33863-histograms-of-oriented-gradients.' for HOG features. However, it does not specify a version number for Matlab or the specific implementation, which is required for reproducibility.
Experiment Setup No The paper mentions '40 neurons in the hidden layer' for the network architecture and discusses 'heuristics for initialization of the weights, learning rate rules, mini-batches, Nesterov s momentum... and dropout' for SGD, but does not provide concrete numerical values for these hyperparameters or detailed system-level training settings.