Bifurcations and loss jumps in RNN training

Authors: Lukas Eisenmann, Zahra Monfared, Niclas Göring, Daniel Durstewitz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we first mathematically prove for a particular class of Re LU-based RNNs that certain bifurcations are indeed associated with loss gradients tending toward infinity or zero. We then introduce a novel heuristic algorithm for detecting all fixed points and k-cycles in Re LU-based RNNs and their existence and stability regions, hence bifurcation manifolds in parameter space. ... We exemplify the algorithm on the analysis of the training process of RNNs, and find that the recently introduced technique of generalized teacher forcing completely avoids certain types of bifurcations in training.
Researcher Affiliation Academia 1Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany 2Faculty of Physics and Astronomy, Heidelberg University, Heidelberg, Germany 3Interdisciplinary Center for Scientific Computing, Heidelberg University
Pseudocode Yes Algorithm 1 SCYFI The algorithm is iteratively run with k = 1 Kmax, with Kmax the max. order of cycles tested
Open Source Code Yes The whole procedure is formalized in Algorithm 1, and the code is available at https://github.com/Durstewitz Lab/SCYFI.
Open Datasets Yes Next we illustrate the application of SCYFI on a real-world example, learning the behavior of a rodent spiking cortical neuron observed through time series measurements of its membrane potential... For this, we produced time series of membrane voltage and a gating variable from a biophysical neuron model [17], on which we trained a dend PLRNN [10] using BPTT [52] with sparse teacher forcing (STF) [37].
Dataset Splits No No explicit information on training/validation/test dataset splits is provided.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instances) used for the experiments are mentioned in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers for its experiments.
Experiment Setup No While the paper describes the models used (e.g., M=6 latent states, H=20 hidden dimensions), it does not provide specific experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer settings for training.