On the Stability and Scalability of Node Perturbation Learning
Authors: Naoki Hiratani, Yash Mehta, Timothy Lillicrap, Peter E Latham
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate these issues both analytically, in deep linear networks, and numerically, in deep nonlinear ones. |
| Researcher Affiliation | Collaboration | Naoki Hiratani Harvard; Gatsby Unit, UCL, Yash Mehta Janelia; Gatsby Unit, UCL, Timothy P. Lillicrap Deepmind London, UK, Peter E. Latham Gatsby Unit, UCL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Simulation codes for the figures are made available at https://github.com/nhiratani/node_perturbation. Did you include the code, data, and instructions needed to reproduce the main ex- perimental results (either in the supplemental material or as a URL)? [Yes] See Appendix C and anonymized repository: https://github.com/nhiratani/node_ perturbation. |
| Open Datasets | Yes | We first applied NP to the SARCOS dataset... MNIST tasks... CIFAR-10 task. The SARCOS dataset contains 44000 training samples and 4449 test samples. We used the default split. For MNIST, we used a standard fully connected network with tanh activations. We used 60000 training samples and 10000 test samples. The training data was split into training (55000) and validation (5000). |
| Dataset Splits | Yes | For MNIST, we used a standard fully connected network with tanh activations. We used 60000 training samples and 10000 test samples. The training data was split into training (55000) and validation (5000). Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix C. |
| Hardware Specification | Yes | All of our experiments were run on a single NVIDIA 3090 GPU. |
| Software Dependencies | Yes | We implemented our code in PyTorch (version 1.10.1). |
| Experiment Setup | Yes | We used the Adam optimizer with a learning rate = 10 3 and batch size 256. The weights were initialized with Xavier Glorot initialization. |