Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians
Authors: Rainer Engelken
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that gradient flossing controls not only the gradient norm but also the condition number of the long-term Jacobian, facilitating multidimensional error feedback propagation. We find that applying gradient flossing prior to training enhances both the success rate and convergence speed for tasks involving long time horizons. For challenging tasks, we show that gradient flossing during training can further increase the time horizon that can be bridged by backpropagation through time. Moreover, we demonstrate the effectiveness of our approach on various RNN architectures and tasks of variable temporal complexity. |
| Researcher Affiliation | Academia | Rainer Engelken Zuckerman Mind Brain Behavior Institute Columbia University New York, USA re2365@columbia.edu |
| Pseudocode | Yes | A simple implementation of this algorithm in pseudocode is: Algorithm 1 Algorithm for gradient flossing of k tangent space directions |
| Open Source Code | Yes | Code in Julia using Flux [77, 78] is available at https://github.com/Rainer Engelken/Gradient Flossing |
| Open Datasets | No | The paper uses "synthetic tasks" like "delayed copy task" and "delayed XOR task" which are defined within the paper, but no concrete access information (link, DOI, repository name, formal citation) for a publicly available dataset is provided. They appear to be custom-generated for the experiments. |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide specific train/validation/test dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states: "Code in Julia using Flux [77, 78] is available at https://github.com/Rainer Engelken/Gradient Flossing". While Julia and Flux are mentioned, specific version numbers for these software components are not explicitly provided in the text for reproducibility. |
| Experiment Setup | Yes | Parameters: network size N = 32 with 10 network realizations. Error bars in C indicate the 25% and 75% percentiles and solid line shows median. The networks were initialized with 10 different values of initial weight strength g chosen uniformly between 0 and 1. During gradient flossing, they quickly approached three different target values of the first Lyapunov exponents λtarget 1 = { 1, 0.5, 0} within less than 100 training epochs with batch size B = 1. Parameters: g = 1, batch size b = 16, N = 80, epochs = 104, T = 300, gradient flossing for Ef = 500 epochs on k = 75 before training. |