On the Computational Efficiency of Training Neural Networks
Authors: Roi Livni, Shai Shalev-Shwartz, Ohad Shamir
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We describe some experimental evidence, showing that our practical algorithm is competitive with state-of-the-art neural network training methods for depth-2 networks. |
| Researcher Affiliation | Academia | Roi Livni The Hebrew University roi.livni@mail.huji.ac.il Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il |
| Pseudocode | Yes | Figure 1: Approximate tensor maximization. Input: {xi}m i=1 Rd α Rm, τ,δ Output: A 1 τ d approximate solution to max w , u , v =1 F(w, u, v) = X i αi(w xi)(u xi)(v xi) Pick randomly w1, . . . , ws iid according to N(0, Id). For t = 1, . . . , 2d log 1 δ wt wt wt Let A = P i αi(w t xi)xix i and set ut, vt s.t: Tr(u t Avt) (1 τ) max u , v =1 Tr(u Av). Return w, u, v the maximizers of maxi s F(wi, ui, ui). |
| Open Source Code | No | The paper does not explicitly state that source code for its methodology is made available or provide a link to a repository for its own code. |
| Open Datasets | No | We collected 200k training examples of image patches of size 88x40 pixels containing either pedestrians (positive examples) or hard negative examples... |
| Dataset Splits | Yes | We used half of the examples as a training set and the other half as a test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'the Matlab implementation provided in http://www.mathworks.com/matlabcentral/ fileexchange/33863-histograms-of-oriented-gradients.' for HOG features. However, it does not specify a version number for Matlab or the specific implementation, which is required for reproducibility. |
| Experiment Setup | No | The paper mentions '40 neurons in the hidden layer' for the network architecture and discusses 'heuristics for initialization of the weights, learning rate rules, mini-batches, Nesterov s momentum... and dropout' for SGD, but does not provide concrete numerical values for these hyperparameters or detailed system-level training settings. |