Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians

Authors: Rainer Engelken

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that gradient flossing controls not only the gradient norm but also the condition number of the long-term Jacobian, facilitating multidimensional error feedback propagation. We find that applying gradient flossing prior to training enhances both the success rate and convergence speed for tasks involving long time horizons. For challenging tasks, we show that gradient flossing during training can further increase the time horizon that can be bridged by backpropagation through time. Moreover, we demonstrate the effectiveness of our approach on various RNN architectures and tasks of variable temporal complexity.
Researcher Affiliation Academia Rainer Engelken Zuckerman Mind Brain Behavior Institute Columbia University New York, USA re2365@columbia.edu
Pseudocode Yes A simple implementation of this algorithm in pseudocode is: Algorithm 1 Algorithm for gradient flossing of k tangent space directions
Open Source Code Yes Code in Julia using Flux [77, 78] is available at https://github.com/Rainer Engelken/Gradient Flossing
Open Datasets No The paper uses "synthetic tasks" like "delayed copy task" and "delayed XOR task" which are defined within the paper, but no concrete access information (link, DOI, repository name, formal citation) for a publicly available dataset is provided. They appear to be custom-generated for the experiments.
Dataset Splits No The paper mentions training and testing but does not explicitly provide specific train/validation/test dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper states: "Code in Julia using Flux [77, 78] is available at https://github.com/Rainer Engelken/Gradient Flossing". While Julia and Flux are mentioned, specific version numbers for these software components are not explicitly provided in the text for reproducibility.
Experiment Setup Yes Parameters: network size N = 32 with 10 network realizations. Error bars in C indicate the 25% and 75% percentiles and solid line shows median. The networks were initialized with 10 different values of initial weight strength g chosen uniformly between 0 and 1. During gradient flossing, they quickly approached three different target values of the first Lyapunov exponents λtarget 1 = { 1, 0.5, 0} within less than 100 training epochs with batch size B = 1. Parameters: g = 1, batch size b = 16, N = 80, epochs = 104, T = 300, gradient flossing for Ef = 500 epochs on k = 75 before training.