reproducibilityindex.ai

Towards Scaling Difference Target Propagation by Learning Backprop Targets

Authors: Maxence M Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present several experimental results supporting the above theory. We first numerically demonstrate the claims stated by Theorem 4.2 and Theorem 4.3, thereby showing the efficiency of the proposed approach to align feedforward and feedback weights (JMC) and subsequently compute DTP feedforward weight updates well aligned with BP gradients (GMP). Next, we present training simulation results on MNIST, F-MNIST and CIFAR-10, where our approach significantly outperforms Meulemans et al. (2020) s DTP. Finally, we report the best results ever obtained on Image Net 32 32 by a DTP algorithm.
Researcher Affiliation	Collaboration	1IBM Research, Paris. Work done during a remote internship at Mila. 2Mila. 3Concordia University. 4Ude M. 5Mc Gill University. 6Montreal Neurological Institute.
Pseudocode	Yes	Algorithm 1 Standard DTP feedback weight training (Lee et al., 2015)
Open Source Code	Yes	Our code is available at https://github.com/ernoult/scaling DTP.
Open Datasets	Yes	Finally, we validate our novel implementation of DTP on training experiments on MNIST, Fashion MNIST, CIFAR-10 and Image Net 32 32 (van den Oord et al., 2016) (Section 5.3).
Dataset Splits	No	We display in Table 1 the accuracies obtained with our DTP, s-DDTP and p-DDTP on MNIST, Fashion MNIST ( F-MNIST ) and CIFAR-10. Our DTP outperforms s-DDTP and p-DDTP on all tasks, by 0.3% on MNIST and F-MNIST, by at least 9% on CIFAR-10 and is within 1% of the BP baseline performance.
Hardware Specification	No	As it can be seen from Appendix E, our DTP implementation can be up to 30 times slower than BP.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup	Yes	In Tables 5,6,8,9, we report the hyperparameters for each method and dataset studied. In both CIFAR-10 and Imagenet 32 32 experiments we use the same data augmentation consisting of random horizontal flipping with 0.5 probability and random cropping with padding 4.