On the difficulty of learning chaotic dynamics with RNNs

Authors: Jonas Mikhaeil, Zahra Monfared, Daniel Durstewitz

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights we suggest ways of how to optimize the training process on chaotic data according to the system s Lyapunov spectrum, regardless of the employed RNN architecture. ... We illustrate the implications of our theory for RNN training on several simulated and empirical chaotic time series, and adapt the idea of sparsely forced Back-Propagation Through Time (BPTT) as a simple yet effective remedy that enables to learn the underlying dynamics despite exploding gradients.
Researcher Affiliation Academia Jonas M. Mikhaeil1,2,*, Zahra Monfared1,4*, and Daniel Durstewitz1,2,3 j.mikhaeil@columbia.edu, {zahra.monfared, daniel.durstewitz}@zi-mannheim.de 1Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany 2Faculty of Physics and Astronomy, Heidelberg University, Heidelberg, Germany 3Interdisciplinary Center for Scientific Computing, Heidelberg University 4Department of Mathematics & Informatics and Cluster of Excellence STRUCTURES, Heidelberg University, Heidelberg, Germany
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes All code from this paper is available at https://github.com/Durstewitz Lab/Chaos RNN.
Open Datasets Yes Similar results are reported for another real-world dataset, electroencephalogram (EEG) recordings, in Appx. A.6.5. ... We used the PhysioNet EEG motor imagery dataset (Dataset 1 of the BCI2000 competition IV [79]), accessible through PhysioBank [27].
Dataset Splits No The paper does not provide explicit training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper states "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]" in its checklist, and no specific hardware details are found in the main text.
Software Dependencies No As optimizer we used Adam [47] from Py Torch [68] with a learning rate of 0.001. ... The maximal Lyapunov exponent was determined with the TISEAN package [30]. (No version numbers provided for PyTorch or TISEAN)
Experiment Setup Yes As optimizer we used Adam [47] from Py Torch [68] with a learning rate of 0.001. For all models, training proceeded solely by sparsely forced BPTT and did not employ gradient clipping or any other technique that may interfere with optimal loss truncation. ... The training procedure was the same as for sparsely forced BPTT, except that instead of supplying a control-signal, gradients were normalized to 1 prior to each parameter update (see Fig. 13 for a more systematic evaluation of different clipping procedures and thresholds).