Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Error Forcing in Recurrent Neural Networks

Authors: A Sağtekin, Colin Bredenberg, Cristina Savin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, EF consistently outperforms other learning algorithms across several tasks and its benefits persist when additional biological constraints are taken into account. In this section, we evaluate the effectiveness of the error forcing (EF) mechanism and compare it with competing algorithms.
Researcher Affiliation	Academia	A. Erdem Sa gtekin New York University University of Tübingen EMAIL Colin Bredenberg University of Montreal Mila Quebec AI Institute EMAIL Cristina Savin New York University EMAIL
Pseudocode	No	The paper describes algorithms (e.g., error forcing algorithm) and presents computational graphs in figures, but it does not contain explicit pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Our code is available at https://github.com/Savin-Lab-Code/error-forcing.
Open Datasets	No	The paper uses synthetic data generated for specific tasks (delayed XOR, sine wave generation, evidence integration) rather than relying on external, publicly available datasets. For example: "This is a nonlinear working memory task in which two input channels deliver rectangular pulses of amplitude +1 or 1, separated by a fixed inter-stimulus delay at the start of each trial." The text describes the generation process of the data for each task, but does not provide access information for a pre-existing public dataset.
Dataset Splits	No	The paper describes that "During each epoch, 12 batches were used for testing, i.e., without forcing." and mentions the number of trials per batch for each task. However, it does not specify traditional train/test/validation dataset splits (e.g., percentages or fixed sample counts) for a static dataset, as the data is generated per trial/batch.
Hardware Specification	Yes	The experiments were conducted using an AMD Rome CPU.
Software Dependencies	No	The paper mentions using the "Adam optimizer" and a "linear decoder" but does not specify version numbers for any programming languages, libraries, or other software components used.
Experiment Setup	Yes	We used leaky RNNs in the form: τ )rt 1 + t τ Wθrtanh(rt 1) + t τ Wθxxt + t where Wθr is for recurrent connections, Wθx for input to hidden neurons, b is the bias, which can be different for each neuron and is trained, and we used a linear decoder as in the main text. For every experiment, we used τ = 10 ms, and N = 50 neurons. We never trained the input-to-hidden connections, and always trained the recurrent and output connections with a learning rate 0.001, using the Adam optimizer, except for the delayed XOR task with -BPTT, where we used learning rate 0.0003. For initialization, we used: Wij θr N 0, g2 where g is the spectral radius, which was set at 1.5. The number of input units Nx depended on the task: it was 3 for delayed XOR (two stimuli, one cue), 2 for evidence integral (two stimuli, no cue), and 1 for sine wave generation (1 stimulus). Inputs were initialized with: Wij θx N(0, 1), and finally the decoder weights were initialized with: Wij ϕ N(0, 1 As initial values, we used b 0 r0 0. We did not treat r0 as a trainable parameter. We used L2 regularization for neuron activities with a 0.00001 scalar, for evidence integration and the sine wave generation task, which was arbitrarily chosen and was not optimized. For every task, we trained the networks for at most 200 epochs with a batch size of 128. During each epoch, 12 batches were used for testing, i.e., without forcing. We neither clipped the gradients nor used weight decay. Each batch included 512 trials for delayed XOR, 1024 trials for evidence integration, and 2048 trials for sine wave generation.