reproducibilityindex.ai

Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment (DFA) to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our ﬁndings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to ﬁne-tuned backpropagation. 3 Experiments We study the applicability of DFA to a diverse set of applications requiring state-of-the-art architectures.
Researcher Affiliation	Collaboration	Julien Launay1,2 Iacopo Poli1 François Boniface1 Florent Krzakala1,2,3 1Light On 2LPENS, École Normale Supérieure 3 Ide Phics, EPFL {julien, iacopo, francois, florent}@lighton.ai
Pseudocode	No	The paper describes the forward and backward passes using mathematical equations and prose, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code is available on the paper website at lair.lighton.ai/dfa-scales.
Open Datasets	Yes	We evaluate these methods on the Criteo dataset [48], which features nearly 46 million samples of one million sparse features. We evaluate performance on three citation network datasets: Cora, Cite Seer, and Pub Med [65]. ... We train a Transformer to predict the next word on the Wiki Text-103 dataset [81], a large collection of good and featured Wikipedia articles.
Dataset Splits	No	The paper mentions "validation perplexity" for the Transformer experiments but does not provide specific percentages or absolute counts for training, validation, or test dataset splits for any of the experiments described.
Hardware Specification	No	The paper mentions using "substantial cloud compute resources, with state-of-the-art GPU hardware," but it does not provide specific models or configurations for the GPUs, CPUs, or other hardware used to run the experiments.
Software Dependencies	No	The paper mentions using "Py Torch Geometric [64]" and "Adam [83]" but does not provide specific version numbers for these or any other software dependencies used in the experiments.
Experiment Setup	Yes	Hyper-parameters ﬁne-tuned for BP did not fare well with DFA, but changes in the optimizer narrowed the gap between BP and DFA considerably. The learning rate schedule used on top of Adam [83] in [63] proved detrimental. Using Adam alone required reducing the learning rate between BP and DFA. Increasing β2 from 0.98 [63] to 0.999 improved performance signiﬁcantly. Finally, a simple scheduler that reduces the learning rate when the validation perplexity plateaus helped reducing it further. With the scheduler, the initial learning rate is 1.10 4 and it is multiplied by 0.2 when performance plateaus, with a patience of 1.