Charged Point Normalization: An Efficient Solution to the Saddle Point Problem

Authors: Armen Aghajanyan

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The system drastically improves learning in a range of deep neural networks on various data-sets in comparison to non-CPN neural networks.
Researcher Affiliation Academia Armen Aghajanyan Bellevue, WA 98007, USA armen.ag@live.com. The paper does not provide explicit university or company names for affiliation, only a personal email and city/state. However, given it is a research paper submitted to ICLR (an academic conference), it is classified as academic.
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions that CPN was implemented in Theano and Keras, which are third-party libraries, but does not provide any statement or link to the open-source code for their specific CPN implementation.
Open Datasets Yes The first test conducted was using a multilayer perceptron on the MNIST dataset. ... The next experiment conducted was using a convolutional neural network on the CIFAR10 (Krizhevsky et al., a). ... The CIFAR100 (Krizhevsky et al., b) setup was nearly identical... We selected the path-finding problem of the BABI dataset...
Dataset Splits No We do not show results on a validation set, because we care about the efficiency and performance of the optimization algorithm, not whether or not it overfits. ... We used the train split of each data-set. The paper uses subsets of CIFAR (10,000 or 20,000 random images) but does not specify explicit train/test/validation splits (percentages or counts) needed for full reproducibility.
Hardware Specification Yes All training and testing was run on a Nvidia GTX 980 GPU.
Software Dependencies No Charged Point Normalization was implemented in Theano (Bastien et al., 2012) and integrated with the Keras (Chollet, 2015) library. The paper does not specify version numbers for these software components.
Experiment Setup Yes The CPN hyper-parameters were: β = 0.001, λ = 0.1 with the moving average parameter α = 0.95. ... The optimization algorithm used was stochastic gradient descent with a learning rate of 0.01, decay of 1e 6, momentum of 0.9, with nesterov acceleration. The batch size used was 32. ... The ADAM (Kingma & Ba, 2014) optimization algorithm was used for both recurrent structures with the parameters: α = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e 08.