reproducibilityindex.ai

How Important Is Weight Symmetry in Backpropagation?

Authors: Qianli Liao, Joel Leibo, Tomaso Poggio

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using 15 different classiﬁcation datasets, we systematically investigate to what extent BP really depends on weight symmetry. In a study that turned out to be surprisingly similar in spirit to Lillicrap et al. s demonstration (Lillicrap et al. 2014) but orthogonal in its results, our experiments indicate that: (1) the magnitudes of feedback weights do not matter to performance (2) the signs of feedback weights do matter the more concordant signs between feedforward and their corresponding feedback connections, the better (3) with feedback weights having random magnitudes and 100% concordant signs, we were able to achieve the same or even better performance than SGD. (4) some normalizations/stabilizations are indispensable for such asymmetric BP to work, namely Batch Normalization (BN) (Ioffe and Szegedy 2015) and/or a Batch Manhattan (BM) update rule.
Researcher Affiliation	Academia	Qianli Liao and Joel Z. Leibo and Tomaso Poggio Center for Brains, Minds and Machines, Mc Govern Institute Massachusetts Institute of Technology 77 Massachusetts Ave., Cambridge, MA, 02139, USA
Pseudocode	No	The paper describes algorithms and update rules using mathematical notation and descriptive text (e.g., in Section 2 and 3) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We extensively test our algorithms on 15 datasets of 5 Categories as described below. No data augmentation (e.g., cropping, ﬂip, etc.) is used in any of the experiments. Machine learning tasks: MNIST (Le Cun, Cortes, and Burges ), CIFAR-10 (Krizhevsky 2009), CIFAR-100 (Krizhevsky 2009), SVHN(Netzer et al. 2011), STL10 (Coates, Ng, and Lee 2011). Standard training and testing splits were used. Basic-level categorization tasks: Caltech101 (Fei-Fei, Fergus, and Perona 2007): 102 classes, 30 training and 10 testing samples per class. Caltech256-101 (Grifﬁn, Holub, and Perona 2007): we train/test on a subset of randomly sampled 102 classes. 30 training and 10 testing per class. i Cub World dataset (Fanello et al. 2013): We followed the standard categorization protocol of this dataset. Fine-grained recognition tasks: Flowers17 (Nilsback and Zisserman 2006), Flowers102 (Nilsback and Zisserman 2008). Standard training and testing splits were used. Face Identiﬁcation: Pubﬁg83-ID (Pinto et al. 2011), SUFRW-ID (Leibo, Liao, and Poggio 2014), LFW-ID (Huang et al. 2008) We did not follow the usual (veriﬁcation) protocol of these datasets. Instead, we performed a 80-way face identiﬁcation task on each dataset, where the 80 identities (IDs) were randomly sampled. Pubﬁg83: 85 training and 15 testing samples per ID. SUFR-W: 10 training and 5 testing per ID. LFW: 10 training and 5 testing per ID. Scene recognition: MIT-indoor67 (Quattoni and Torralba 2009): 67 classes, 80 training and 20 testing per class Non-visual task: TIMIT-80 (Garofolo et al. ): Phoneme recognition using a fully-connected network. There are 80 classes, 400 training and 100 testing samples per class.
Dataset Splits	No	The paper mentions 'The best validation error among all epochs of 5 runs was recorded' and 'Standard training and testing splits were used' for some datasets, and specific training/testing sample counts for others. However, it does not explicitly provide the specific percentages or sample counts for training, validation, and testing splits for all datasets to fully reproduce the partitioning.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper refers to common algorithms and techniques (e.g., Batch Normalization, SGD) and mentions uses of 'mini-batch', but it does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions).
Experiment Setup	Yes	Momentum was used with hyperparameter 0.9 (a conventional setting). All experiments were run for 65 epochs. The base learning rate: 1 to 50 epochs 5 × 10−4, 51 to 60 epochs 5 × 10−5, and 61 to 65 epochs 5 × 10−6. All models were run 5 times on each dataset with base learning rate multiplied by 100, 10, 1, 0.1, 0.01 respectively. This is because different learning algorithms favor different magnitudes of learning rates. The best validation error among all epochs of 5 runs was recorded as each model s ﬁnal performance. The batch sizes were all set to 100 unless stated otherwise. All experiments used a softmax for classiﬁcation and the crossentropy loss function. For testing with batch normalization, we compute exponential moving averages (alpha=0.05) of training means and standard deviations over 20 mini batches after each training epoch.