Probabilistic Line Searches for Stochastic Optimization
Authors: Maren Mahsereci, Philipp Hennig
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments were performed on the well-worn problems of training a 2-layer neural net with logistic nonlinearity on the MNIST and CIFAR-10 datasets. [...] Fig. 4, top, shows test errors after 10 epochs as a function of the initial learning rate α0 (error bars based on 20 random re-starts). |
| Researcher Affiliation | Academia | Maren Mahsereci and Philipp Hennig Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany [mmahsereci|phennig]@tue.mpg.de |
| Pseudocode | No | The paper describes the algorithm in prose and mathematical formulations but does not include structured pseudocode or an algorithm block. |
| Open Source Code | No | Our matlab implementation will be made available at time of publication of this article. |
| Open Datasets | Yes | Our experiments were performed on the well-worn problems of training a 2-layer neural net with logistic nonlinearity on the MNIST and CIFAR-10 datasets. [...] http://yann.lecun.com/exdb/mnist/ and http://www.cs.toronto.edu/ kriz/cifar.html. |
| Dataset Splits | No | The paper mentions 'batches of size m = 10' and 'test errors' but does not specify details for a separate validation split, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions 'Our matlab implementation' but does not provide specific version numbers for Matlab or any other software dependencies. |
| Experiment Setup | Yes | We then trained networks with vanilla SGD with and without α-decay (using the schedule α(i) = α0/i), and SGD using the probabilistic line search, with α0 ranging across five orders of magnitude, on batches of size m = 10. [...] In our networks, constant learning rates of α = 0.75 and α = 0.08 for MNIST and CIFAR-10, respectively, achieved the lowest test error after the first 103 steps of SGD. |