Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction
Authors: Christoph Jürgen Hemmer, Manuel Brenner, Florian Hess, Daniel Durstewitz
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For our numerical studies we focus on a well established SOTA model and training algorithm for DSR, the piecewise linear recurrent neural network (PLRNN) first introduced in (Durstewitz, 2017), but also checked LSTMs and vanilla RNNs to highlight that our results are more general. ... More specifically, we evaluated DSR performance on several DS benchmarks for three different iterative pruning protocols (Algorithm 1; Fig. 3). |
| Researcher Affiliation | Academia | 1Department of Theoretical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany 2Faculty of Physics and Astronomy, Heidelberg University, Heidelberg, Germany 3Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg, Germany. |
| Pseudocode | Yes | Pseudo-code for the iterative pruning procedure, retaining initial parameters θ0 but updating the mask in each iteration, is given in Algorithm 1. ... Based on such a topological characterization of network graphs obtained through geometric pruning of trained PLRNNs, in sect. 4.4 we derive an algorithm that creates an adjacency matrix Aadj with the desired properties which can be used as a mask m in Eqn. 4. Fig. 2 illustrates the general approach. ... Algorithm 2 Geo Hub network algorithm. |
| Open Source Code | Yes | All code created is available at https://github.com/ Durstewitz Lab/RNNtopo DSR. |
| Open Datasets | Yes | First, the Lorenz-63 model of atmospheric convection, proposed by Edward Lorenz (Lorenz, 1963), produces a chaotic attractor with iconic butterfly-wing structure (Fig. 1) and is probably the most commonly employed benchmark in this whole literature. ... Third, as a real-world example, we used human electrocardiogram (ECG) data bearing signatures of chaos, with a positive maximum Lyapunov exponent (see Hess et al. (2023)). ... Electrocardiogram (ECG) time series were taken from the PPG-Da Li Adataset (Reiss et al., 2019). |
| Dataset Splits | No | The paper describes training on time series data and evaluating performance for DSR. It states 'Across all training epochs of a given run, we consistently (for all comparisons and protocols) selected the model with the lowest Dstsp.' which implies a validation process for model selection. However, it does not provide explicit details about data splits for training, validation, or testing (e.g., percentages or sample counts for each split), only general statements about drawing 'trajectories of 10^5 time steps...for training'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. It focuses on the models, algorithms, and results without specifying the computational infrastructure. |
| Software Dependencies | No | The paper mentions software components like 'rectified adaptive moment estimation (RADAM) (Liu et al., 2020) as the optimizer' and the 'Dynamical Systems.jl Julia library'. However, it does not provide specific version numbers for these software dependencies, which is required for reproducible description of ancillary software. |
| Experiment Setup | Yes | For each training epoch we sample several subsequences of length T, x(p) 1: T = xtp:tp+ T where tp [1, T T] is chosen randomly. These subsequences n x(p) 1: T p=1 are then arranged into a batch of size S. ... We took rectified adaptive moment estimation (RADAM) (Liu et al., 2020) as the optimizer, using L = 50 batches of size S = 16 in each epoch. We chose M = {50, 100, 100, 50, 100}, τ = {16, 10, 5, 8, 8}, T = {200, 50, 50, 300, 200}, ηstart = {10 2, 10 3, 10 3, 5 10 3, 5 10 3}, and epochs = {2000, 3000, 4000, 3000, 3000} for the {Lorenz-63, ECG, Bursting Neuron, R ossler, Lorenz-96}, respectively, and ηend = 10 5 for all settings. Parameters in W were initialized using a Gaussian initialization with σ = 0.01, h simply as a vector of zeros, and A as the diagonal of a normalized positive-definite random matrix. |