Fixed-Weight Difference Target Propagation

Authors: Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental FW-DTP consistently achieves higher test performance than a baseline, the Difference Target Propagation (DTP), on four classification datasets. We also present a novel propagation architecture that explains the exact form of the feedback function of DTP to analyze FW-DTP. Our code is available at https://github.com/Tatsukichi Shibuya/Fixed Weight-Difference-Target-Propagation.
Researcher Affiliation Collaboration Tatsukichi Shibuya1, Nakamasa Inoue1, Rei Kawakami1, Ikuro Sato1,2 1 Tokyo Institute of Technology 2 Denso IT Laboratory shibuya.t.ad@m.titech.ac.jp, inoue@c.titech.ac.jp, reikawa@sc.e.titech.ac.jp, isato@c.titech.ac.jp
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Tatsukichi Shibuya/Fixed Weight-Difference-Target-Propagation.
Open Datasets Yes We compared image classification performance of TP (Bengio 2014), DTP (Lee et al. 2015), DRL (Meulemans et al. 2020), L-DRL (Ernoult et al. 2022), and FW-DTP on four datasets: MNIST (Lecun et al. 1998), Fashion-MNIST (F-MNIST) (Xiao, Rasul, and Vollgraf 2017), CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009).
Dataset Splits Yes For the hyperparameter search, 5,000 samples from the training set are used as the validation set.
Hardware Specification Yes 4 GPUs (Tesla P100-SXM2-16GB) with 56 CPU cores are used to measure computational time.
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used for the experiments.
Experiment Setup Yes Following previous studies (Bartunov et al. 2018; Meulemans et al. 2020), a fully connected network consists of 6 layers each with 256 units was used for MNIST and F-MNIST. Another fully connected network consists of 4 layers each with 1,024 units was used for CIFAR-10/100. The activation function and the optimizer were the same as those used in the experiment of Jacobian. For the hyperparameter search, 5,000 samples from the training set are used as the validation set. For DTP, DRL and L-DRL, the feedback weights are updated five times in each iteration.