Understanding Synthetic Gradients and Decoupled Neural Interfaces
Authors: Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an empirical analysis of the learning dynamics on easily analysable artificial data. We create 2 and 100 dimensional versions of four basic datasets (details in the Supplementary Materials Section D) and train four simple models (a linear model and a deep linear one with 10 hidden layers, trained to minimise MSE and log loss) with regular backprop and with a SG-based alternative to see whether it (numerically) converges to the same solution. |
| Researcher Affiliation | Industry | 1Deep Mind, London, United Kingdom. |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured steps formatted like code. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We train deep relu networks of varied depth (up to 50 hidden layers) with batch-normalisation and with two different activation functions on MNIST and compare models trained with full backpropagation to variants that employ a SG module in the middle of the hidden stack. |
| Dataset Splits | No | The paper mentions experiments on MNIST, but it does not provide specific details on dataset splits (e.g., percentages, sample counts, or citations to standard splits) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments, only general statements about 'training neural networks'. |
| Software Dependencies | No | The paper does not provide specific details on software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their respective versions) used for the experiments. |
| Experiment Setup | Yes | We train deep relu networks of varied depth (up to 50 hidden layers) with batch-normalisation and with two different activation functions on MNIST and compare models trained with full backpropagation to variants that employ a SG module in the middle of the hidden stack. ... We train with a small L2 penalty added to weights... |