Towards Scaling Difference Target Propagation by Learning Backprop Targets
Authors: Maxence M Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present several experimental results supporting the above theory. We first numerically demonstrate the claims stated by Theorem 4.2 and Theorem 4.3, thereby showing the efficiency of the proposed approach to align feedforward and feedback weights (JMC) and subsequently compute DTP feedforward weight updates well aligned with BP gradients (GMP). Next, we present training simulation results on MNIST, F-MNIST and CIFAR-10, where our approach significantly outperforms Meulemans et al. (2020) s DTP. Finally, we report the best results ever obtained on Image Net 32 32 by a DTP algorithm. |
| Researcher Affiliation | Collaboration | 1IBM Research, Paris. Work done during a remote internship at Mila. 2Mila. 3Concordia University. 4Ude M. 5Mc Gill University. 6Montreal Neurological Institute. |
| Pseudocode | Yes | Algorithm 1 Standard DTP feedback weight training (Lee et al., 2015) |
| Open Source Code | Yes | Our code is available at https://github.com/ernoult/scaling DTP. |
| Open Datasets | Yes | Finally, we validate our novel implementation of DTP on training experiments on MNIST, Fashion MNIST, CIFAR-10 and Image Net 32 32 (van den Oord et al., 2016) (Section 5.3). |
| Dataset Splits | No | We display in Table 1 the accuracies obtained with our DTP, s-DDTP and p-DDTP on MNIST, Fashion MNIST ( F-MNIST ) and CIFAR-10. Our DTP outperforms s-DDTP and p-DDTP on all tasks, by 0.3% on MNIST and F-MNIST, by at least 9% on CIFAR-10 and is within 1% of the BP baseline performance. |
| Hardware Specification | No | As it can be seen from Appendix E, our DTP implementation can be up to 30 times slower than BP. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | In Tables 5,6,8,9, we report the hyperparameters for each method and dataset studied. In both CIFAR-10 and Imagenet 32 32 experiments we use the same data augmentation consisting of random horizontal flipping with 0.5 probability and random cropping with padding 4. |