reproducibilityindex.ai

Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians

Authors: Rainer Engelken

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that gradient ﬂossing controls not only the gradient norm but also the condition number of the long-term Jacobian, facilitating multidimensional error feedback propagation. We ﬁnd that applying gradient ﬂossing prior to training enhances both the success rate and convergence speed for tasks involving long time horizons. For challenging tasks, we show that gradient ﬂossing during training can further increase the time horizon that can be bridged by backpropagation through time. Moreover, we demonstrate the effectiveness of our approach on various RNN architectures and tasks of variable temporal complexity.
Researcher Affiliation	Academia	Rainer Engelken Zuckerman Mind Brain Behavior Institute Columbia University New York, USA re2365@columbia.edu
Pseudocode	Yes	A simple implementation of this algorithm in pseudocode is: Algorithm 1 Algorithm for gradient ﬂossing of k tangent space directions
Open Source Code	Yes	Code in Julia using Flux [77, 78] is available at https://github.com/Rainer Engelken/Gradient Flossing
Open Datasets	No	The paper uses "synthetic tasks" like "delayed copy task" and "delayed XOR task" which are defined within the paper, but no concrete access information (link, DOI, repository name, formal citation) for a publicly available dataset is provided. They appear to be custom-generated for the experiments.
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide specific train/validation/test dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper states: "Code in Julia using Flux [77, 78] is available at https://github.com/Rainer Engelken/Gradient Flossing". While Julia and Flux are mentioned, specific version numbers for these software components are not explicitly provided in the text for reproducibility.
Experiment Setup	Yes	Parameters: network size N = 32 with 10 network realizations. Error bars in C indicate the 25% and 75% percentiles and solid line shows median. The networks were initialized with 10 different values of initial weight strength g chosen uniformly between 0 and 1. During gradient ﬂossing, they quickly approached three different target values of the ﬁrst Lyapunov exponents λtarget 1 = { 1, 0.5, 0} within less than 100 training epochs with batch size B = 1. Parameters: g = 1, batch size b = 16, N = 80, epochs = 104, T = 300, gradient ﬂossing for Ef = 500 epochs on k = 75 before training.