Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Authors: Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.
Researcher Affiliation Academia Yann N. Dauphin Razvan Pascanu Caglar Gulcehre Kyunghyun Cho Universit e de Montr eal dauphiya@iro.umontreal.ca, r.pascanu@gmail.com, gulcehrc@iro.umontreal.ca, kyunghyun.cho@umontreal.ca and Surya Ganguli Stanford University sganguli@standford.edu and Yoshua Bengio Universit e de Montr eal, CIFAR Fellow yoshua.bengio@umontreal.ca
Pseudocode Yes Algorithm 1 Approximate saddle-free Newton
Open Source Code No The paper does not explicitly state that the source code for their methodology is made publicly available.
Open Datasets Yes We used a small MLP trained on a down-sampled version of MNIST and CIFAR-10. and deep autoencoder trained on (full-scale) MNIST and trained a small recurrent neural network having 120 hidden units for the task of character-level language modeling on Penn Treebank corpus.
Dataset Splits No The paper mentions training on datasets like MNIST, CIFAR-10, and Penn Treebank, and that 'The hyperparameters of SGD were selected via random search', which implies a validation process. However, it does not explicitly provide specific percentages, counts, or predefined splits for training, validation, and testing.
Hardware Specification No The paper states that 'Compute Canada, and Calcul Qu ebec for providing computational resources' were used, implying high-performance computing, but does not provide specific hardware details such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions using 'Theano (Bergstra et al., 2010; Bastien et al., 2012)' but does not provide specific version numbers for Theano or any other software dependencies crucial for reproducibility.
Experiment Setup Yes The paper provides some specific experimental setup details, such as 'we used the Krylov subspace descent approach described earlier with 500 subspace vectors' for deep autoencoders, and 'trained a small recurrent neural network having 120 hidden units' for RNNs. It also notes that 'The hyperparameters of SGD were selected via random search' and 'damping coefficients... were selected from a small set at each update'.