Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks
Authors: David Balduzzi, Brian McWilliams, Tony Butler-Yeoman
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. This section empirically investigates the Taylor optimum and regret terms in theorem 2 on two tasks: Autoencoder trained on MNIST. Convnet trained on CIFAR-10. |
| Researcher Affiliation | Collaboration | David Balduzzi School of Mathematics and Statistics Victoria University of Wellington Wellington, New Zealand david.balduzzi@vuw.ac.nz Tony Butler-Yeoman School of Engineering and Computer Science Victoria University of Wellington Wellington, New Zealand butlertony@ecs.vuw.ac.nz Brian Mc Williams Disney Research Zurich, Switzerland brian@disneyresearch.com |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any statements about releasing open-source code for the described methodology. |
| Open Datasets | Yes | Autoencoder trained on MNIST. Dense layers with architecture 784 50 30 20 30 50 784 and Re LU non-linearities. Trained with MSE loss using minibatches of 64. Convnet trained on CIFAR-10. Three convolutional layers with stack size 64 and 5 5 receptive fields, Re LU non-linearities and 2 2 max-pooling. Followed by a 192 unit fully-connected layer with Re LU before a ten-dimensional fully-connected output layer. Trained with cross-entropy loss using minibatches of 128. |
| Dataset Splits | No | The paper mentions training on datasets like MNIST and CIFAR-10 with specified minibatch sizes, and refers to 'training set' accuracy, but does not explicitly provide the specific percentages or counts for training, validation, and test splits needed for reproducibility. For example, 'achieving a small loss and an accuracy of 99% on the training set'. |
| Hardware Specification | Yes | Some experiments were performed using a Tesla K80 kindly donated by NVidia. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' but does not specify its version number or other software dependencies with version numbers. 'We thank L. Helminger and T. Vogels for useful discussions and help with Tensor Flow.' |
| Experiment Setup | Yes | Autoencoder trained on MNIST. Dense layers with architecture 784 50 30 20 30 50 784 and Re LU non-linearities. Trained with MSE loss using minibatches of 64. Convnet trained on CIFAR-10. Three convolutional layers with stack size 64 and 5 5 receptive fields, Re LU non-linearities and 2 2 max-pooling. Followed by a 192 unit fully-connected layer with Re LU before a ten-dimensional fully-connected output layer. Trained with cross-entropy loss using minibatches of 128. The hyperparameters used for different optimizers are as follows: the autoencoder uses learning rate η = 0.001 for RMSprop and η = 0.01 for Adam, while the convnet uses learning rate η = 0.0005 for RMSprop and η = 0.0002 for Adam. All other hyperparameters are kept at their literature-standard values. |