Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
Authors: Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance. |
| Researcher Affiliation | Academia | Yann N. Dauphin Razvan Pascanu Caglar Gulcehre Kyunghyun Cho Universit e de Montr eal dauphiya@iro.umontreal.ca, r.pascanu@gmail.com, gulcehrc@iro.umontreal.ca, kyunghyun.cho@umontreal.ca and Surya Ganguli Stanford University sganguli@standford.edu and Yoshua Bengio Universit e de Montr eal, CIFAR Fellow yoshua.bengio@umontreal.ca |
| Pseudocode | Yes | Algorithm 1 Approximate saddle-free Newton |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is made publicly available. |
| Open Datasets | Yes | We used a small MLP trained on a down-sampled version of MNIST and CIFAR-10. and deep autoencoder trained on (full-scale) MNIST and trained a small recurrent neural network having 120 hidden units for the task of character-level language modeling on Penn Treebank corpus. |
| Dataset Splits | No | The paper mentions training on datasets like MNIST, CIFAR-10, and Penn Treebank, and that 'The hyperparameters of SGD were selected via random search', which implies a validation process. However, it does not explicitly provide specific percentages, counts, or predefined splits for training, validation, and testing. |
| Hardware Specification | No | The paper states that 'Compute Canada, and Calcul Qu ebec for providing computational resources' were used, implying high-performance computing, but does not provide specific hardware details such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Theano (Bergstra et al., 2010; Bastien et al., 2012)' but does not provide specific version numbers for Theano or any other software dependencies crucial for reproducibility. |
| Experiment Setup | Yes | The paper provides some specific experimental setup details, such as 'we used the Krylov subspace descent approach described earlier with 500 subspace vectors' for deep autoencoders, and 'trained a small recurrent neural network having 120 hidden units' for RNNs. It also notes that 'The hyperparameters of SGD were selected via random search' and 'damping coefficients... were selected from a small set at each update'. |