Bifurcations and loss jumps in RNN training
Authors: Lukas Eisenmann, Zahra Monfared, Niclas Göring, Daniel Durstewitz
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we first mathematically prove for a particular class of Re LU-based RNNs that certain bifurcations are indeed associated with loss gradients tending toward infinity or zero. We then introduce a novel heuristic algorithm for detecting all fixed points and k-cycles in Re LU-based RNNs and their existence and stability regions, hence bifurcation manifolds in parameter space. ... We exemplify the algorithm on the analysis of the training process of RNNs, and find that the recently introduced technique of generalized teacher forcing completely avoids certain types of bifurcations in training. |
| Researcher Affiliation | Academia | 1Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany 2Faculty of Physics and Astronomy, Heidelberg University, Heidelberg, Germany 3Interdisciplinary Center for Scientific Computing, Heidelberg University |
| Pseudocode | Yes | Algorithm 1 SCYFI The algorithm is iteratively run with k = 1 Kmax, with Kmax the max. order of cycles tested |
| Open Source Code | Yes | The whole procedure is formalized in Algorithm 1, and the code is available at https://github.com/Durstewitz Lab/SCYFI. |
| Open Datasets | Yes | Next we illustrate the application of SCYFI on a real-world example, learning the behavior of a rodent spiking cortical neuron observed through time series measurements of its membrane potential... For this, we produced time series of membrane voltage and a gating variable from a biophysical neuron model [17], on which we trained a dend PLRNN [10] using BPTT [52] with sparse teacher forcing (STF) [37]. |
| Dataset Splits | No | No explicit information on training/validation/test dataset splits is provided. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instances) used for the experiments are mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for its experiments. |
| Experiment Setup | No | While the paper describes the models used (e.g., M=6 latent states, H=20 hidden dimensions), it does not provide specific experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer settings for training. |