reproducibilityindex.ai

Decoupled Greedy Learning of CNNs

Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments that empirically show that DGL optimizes the greedy objective well, showing it is favorable against recent state-of-the-art proposals for decoupling training of deep network modules. We show that unlike previous decoupled proposals it can still work on a large-scale dataset (Image Net) and that it can, in some cases, generalize better than standard back-propagation. We then extensively evaluate the asynchronous DGL, simulating large delays.
Researcher Affiliation	Collaboration	1MILA 2Center for Computational Mathematics, Flatiron Institute 3CNRS, LIP6.
Pseudocode	Yes	Algorithm 1: Synchronous DGL Algorithm 2: Asynchronous DGL with Replay
Open Source Code	Yes	Code for experiments is included in the submission.
Open Datasets	Yes	We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale Image Net dataset. (Krizhevsky, 2009)
Dataset Splits	No	The paper mentions using CIFAR-10 and Image Net datasets but does not explicitly specify the proportions or counts for training, validation, and test splits, nor does it refer to specific predefined standard splits with detailed information.
Hardware Specification	No	The paper mentions 'single 16GB GPU' but does not provide specific models or manufacturers for the hardware used in experiments (e.g., NVIDIA A100, Intel Xeon).
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries (e.g., 'Python 3.8, PyTorch 1.9'). It mentions optimizers like Adam and SGD but without software versions.
Experiment Setup	Yes	We reproduce the CIFAR-10 CNN experiment described in (Jaderberg et al., 2017), Appendix C.1. This experiment utilizes a 3 layer network with auxiliary networks of 2 hidden CNN layers... using Adam with a learning rate of 3 10 5. We run training for 1500 epochs... For this experiment we use a buffer of size M = 50. We run separate experiments with the slowdown applied at each layer of the network as well as 3 random seeds for each of these settings (thus 18 experiments per data point). We show the evaluations for 10 values of S.