Decoupled Greedy Learning of CNNs
Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments that empirically show that DGL optimizes the greedy objective well, showing it is favorable against recent state-of-the-art proposals for decoupling training of deep network modules. We show that unlike previous decoupled proposals it can still work on a large-scale dataset (Image Net) and that it can, in some cases, generalize better than standard back-propagation. We then extensively evaluate the asynchronous DGL, simulating large delays. |
| Researcher Affiliation | Collaboration | 1MILA 2Center for Computational Mathematics, Flatiron Institute 3CNRS, LIP6. |
| Pseudocode | Yes | Algorithm 1: Synchronous DGL Algorithm 2: Asynchronous DGL with Replay |
| Open Source Code | Yes | Code for experiments is included in the submission. |
| Open Datasets | Yes | We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale Image Net dataset. (Krizhevsky, 2009) |
| Dataset Splits | No | The paper mentions using CIFAR-10 and Image Net datasets but does not explicitly specify the proportions or counts for training, validation, and test splits, nor does it refer to specific predefined standard splits with detailed information. |
| Hardware Specification | No | The paper mentions 'single 16GB GPU' but does not provide specific models or manufacturers for the hardware used in experiments (e.g., NVIDIA A100, Intel Xeon). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries (e.g., 'Python 3.8, PyTorch 1.9'). It mentions optimizers like Adam and SGD but without software versions. |
| Experiment Setup | Yes | We reproduce the CIFAR-10 CNN experiment described in (Jaderberg et al., 2017), Appendix C.1. This experiment utilizes a 3 layer network with auxiliary networks of 2 hidden CNN layers... using Adam with a learning rate of 3 10 5. We run training for 1500 epochs... For this experiment we use a buffer of size M = 50. We run separate experiments with the slowdown applied at each layer of the network as well as 3 random seeds for each of these settings (thus 18 experiments per data point). We show the evaluations for 10 values of S. |