Towards Scaling Difference Target Propagation by Learning Backprop Targets

Authors: Maxence M Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present several experimental results supporting the above theory. We first numerically demonstrate the claims stated by Theorem 4.2 and Theorem 4.3, thereby showing the efficiency of the proposed approach to align feedforward and feedback weights (JMC) and subsequently compute DTP feedforward weight updates well aligned with BP gradients (GMP). Next, we present training simulation results on MNIST, F-MNIST and CIFAR-10, where our approach significantly outperforms Meulemans et al. (2020) s DTP. Finally, we report the best results ever obtained on Image Net 32 32 by a DTP algorithm.
Researcher Affiliation Collaboration 1IBM Research, Paris. Work done during a remote internship at Mila. 2Mila. 3Concordia University. 4Ude M. 5Mc Gill University. 6Montreal Neurological Institute.
Pseudocode Yes Algorithm 1 Standard DTP feedback weight training (Lee et al., 2015)
Open Source Code Yes Our code is available at https://github.com/ernoult/scaling DTP.
Open Datasets Yes Finally, we validate our novel implementation of DTP on training experiments on MNIST, Fashion MNIST, CIFAR-10 and Image Net 32 32 (van den Oord et al., 2016) (Section 5.3).
Dataset Splits No We display in Table 1 the accuracies obtained with our DTP, s-DDTP and p-DDTP on MNIST, Fashion MNIST ( F-MNIST ) and CIFAR-10. Our DTP outperforms s-DDTP and p-DDTP on all tasks, by 0.3% on MNIST and F-MNIST, by at least 9% on CIFAR-10 and is within 1% of the BP baseline performance.
Hardware Specification No As it can be seen from Appendix E, our DTP implementation can be up to 30 times slower than BP.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes In Tables 5,6,8,9, we report the hyperparameters for each method and dataset studied. In both CIFAR-10 and Imagenet 32 32 experiments we use the same data augmentation consisting of random horizontal flipping with 0.5 probability and random cropping with padding 4.