Charged Point Normalization: An Efficient Solution to the Saddle Point Problem
Authors: Armen Aghajanyan
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The system drastically improves learning in a range of deep neural networks on various data-sets in comparison to non-CPN neural networks. |
| Researcher Affiliation | Academia | Armen Aghajanyan Bellevue, WA 98007, USA armen.ag@live.com. The paper does not provide explicit university or company names for affiliation, only a personal email and city/state. However, given it is a research paper submitted to ICLR (an academic conference), it is classified as academic. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that CPN was implemented in Theano and Keras, which are third-party libraries, but does not provide any statement or link to the open-source code for their specific CPN implementation. |
| Open Datasets | Yes | The first test conducted was using a multilayer perceptron on the MNIST dataset. ... The next experiment conducted was using a convolutional neural network on the CIFAR10 (Krizhevsky et al., a). ... The CIFAR100 (Krizhevsky et al., b) setup was nearly identical... We selected the path-finding problem of the BABI dataset... |
| Dataset Splits | No | We do not show results on a validation set, because we care about the efficiency and performance of the optimization algorithm, not whether or not it overfits. ... We used the train split of each data-set. The paper uses subsets of CIFAR (10,000 or 20,000 random images) but does not specify explicit train/test/validation splits (percentages or counts) needed for full reproducibility. |
| Hardware Specification | Yes | All training and testing was run on a Nvidia GTX 980 GPU. |
| Software Dependencies | No | Charged Point Normalization was implemented in Theano (Bastien et al., 2012) and integrated with the Keras (Chollet, 2015) library. The paper does not specify version numbers for these software components. |
| Experiment Setup | Yes | The CPN hyper-parameters were: β = 0.001, λ = 0.1 with the moving average parameter α = 0.95. ... The optimization algorithm used was stochastic gradient descent with a learning rate of 0.01, decay of 1e 6, momentum of 0.9, with nesterov acceleration. The batch size used was 32. ... The ADAM (Kingma & Ba, 2014) optimization algorithm was used for both recurrent structures with the parameters: α = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e 08. |