Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Dynamics of RNNs in Closed-Loop Environments

Authors: Yoav Ger, Omri Barak

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We begin by showing that closed-loop and open-loop training produce fundamentally different learning dynamics, even when using identical architectures and converging to the same final solution. To investigate this divergence, we focus on the largely understudied dynamics of closed-loop RNNs. Specifically, we show that tracking the eigenvalues of the coupled agent environment system (rather than the RNN alone) is both necessary and sufficient to uncover the structure of the learning process, which unfolds in distinct stages reflected in the spectrum and training loss. Notably, closed-loop learning gives rise to a natural trade-off between two competing objectives: myopic, short-sighted policy improvement and long-term system-level stability. Finally, we demonstrate that similar learning dynamics arise in a more complex motor control task, with RNNs progressing through stages similar to those observed in human experiments.
Researcher Affiliation Academia Yoav Ger Omri Barak EMAIL EMAIL Rappaport Faculty of Medicine and Network Biology Research Laboratory Technion, Israel Institute of Technology
Pseudocode No The paper describes the methodologies and models using mathematical equations and textual explanations, but it does not contain any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes All code was implemented in Python using PyTorch [54] and is available on GitHub: https: //github.com/yoavger/closed_loop_rnn_learning_dynamics
Open Datasets No The paper defines specific control tasks ('double integrator task' and 'multi-frequency tracking tasks') and their dynamics within the text. It does not use external, publicly available datasets that require specific links or citations for access. For example: "Our task environment is the classic discrete-time double integrator control problem [29, 30]" and "we trained RNNs on a two-dimensional tracking task inspired by human motor control studies [42, 43]".
Dataset Splits No The paper describes how initial conditions and target trajectories are generated for the tasks, such as "The initial mass state is sampled uniformly: x0 U([ 2, 2]2)" and "Target trajectories were generated as described above, with random phases ϕi U( π, π) resampled each episode". This indicates on-the-fly generation of scenarios rather than the use of pre-defined training, test, or validation splits for a fixed dataset.
Hardware Specification No The paper states in its NeurIPS checklist: "All simulations are lightweight and can be run on an off-the-shelf laptop. No specialized hardware or extensive computational resources are required." This is a general statement about computational requirements but does not provide specific hardware details (e.g., CPU/GPU model, memory).
Software Dependencies No All code was implemented in Python using PyTorch [54] and is available on GitHub: https: //github.com/yoavger/closed_loop_rnn_learning_dynamics. While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with their versions.
Experiment Setup Yes Each network consists of N = 100 neurons, an episode length of T = 50, and the hyperbolic tangent activation function, ϕ( ) = tanh. The input and output weight vectors were initialized independently from N(0, 1/N), and the recurrent weight matrix W was initialized from N(0, g2/N), where g controls the initial recurrent strength. Training was performed using stochastic gradient descent on W , m, and z, with a learning rate of η = 10 2, a batch size of 100, and gradient clipping (2-norm capped at 1) to avoid exploding gradients.