Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning long range dependencies through time reversal symmetry breaking

Authors: Guillaume Pourcel, Maxence Ernoult

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply RHEL to train HSSMs with linear and nonlinear dynamics on a variety of time-series tasks ranging from mid-range to long-range classification and regression with sequence length reaching 50k. We show that RHEL consistently matches the performance of BPTT across all models and tasks 3. This work opens new doors for the design of scalable, energy-efficient physical systems endowed with self-learning capabilities for sequence modelling.
Researcher Affiliation Collaboration Guillaume Pourcel University of Groningen , INRIA EMAIL Maxence Ernoult Google Deep Mind EMAIL
Pseudocode Yes Alg. 1 prescribes an intuitive receipe to compute dθL for the above optimization problem (Eq. (16)), chaining RHEL backward through HRUs. Namely, the echo dynamics of the top-most HRU read as Eq. (13) using the initial learning signal ΦL to nudge its trajectory. On top of estimating its parameter gradients, we also estimate its input gradients which are used to nudge the echo dynamics of the preceding HRU. This procedure is repeated until reaching the first HRU see Fig. 2. Algorithm 1 Recurrent Hamiltonian Echo Learning (RHEL) on a single HRU Inputs: Φ0 (final state of the forward trajectory), Φ (incoming gradient), ϵ (nudging strength), δ (timestep) Outputs: θ (parameter gradient estimate), u (input gradient estimate)
Open Source Code Yes Our code is available on: https://github.com/guillaumepourcel/rhel
Open Datasets Yes The classification datasets are drawn from a recently introduced benchmark [34] that selects a subset of the University of East Anglia (UEA) datasets [35], specifically choosing those with the longest sequences to increase difficulty, and which has been recently employed to evaluate the linear HSSM model [29] These datasets include Eigen Worms (17,984 sequence length, 5 classes), Self Regulation SCP1 (896 length, 2 classes), Self Regulation SCP2 (1,152 length, 2 classes), Ethanol Concentration (1,751 length, 4 classes), Heartbeat (405 length, 2 classes), and Motor Imagery (3,000 length, 2 classes). Additionally, we evaluate our HSSMs on the PPG-Da Li A dataset [36], a multivariate time series regression dataset designed for heart rate prediction using data collected from a wrist-worn device.
Dataset Splits Yes The classification datasets are drawn from a recently introduced benchmark [34] that selects a subset of the University of East Anglia (UEA) datasets [35], specifically choosing those with the longest sequences to increase difficulty... Additionally, we evaluate our HSSMs on the PPG-Da Li A dataset [36]... using the ADAM optimizer with default parameters and an early-stopping procedure based on the validation loss.
Hardware Specification Yes All experiments were run on Nvidia V100 GPUs, except for the PPG experiments, which were run on Nvidia Tesla A100 GPUs due to larger memory demands.
Software Dependencies No The code to run the experiments is implemented using the JAX autodifferentiation framework [74].
Experiment Setup Yes The hyperparameters are: learning rate (lr), number of layers (#blocks), number of hidden neurons (hidden dim), statespace dimension (state dim), and whether the time dimension is sent as input (include time). These hyperparameters were found by grid search and are presented in Table 4. We reused the optimization scheme from [29], using the ADAM optimizer with default parameters and an early-stopping procedure based on the validation loss. For the RHEL algorithm, we have two additional hyperparameters: the nudging strength ϵ and the scaling factor γ (see Appx. A.5.4). The nudging strength ϵ was set to 10 1 without prior tunning. For the scaling factor γ we did a grid search over the values {100, 101, 102, 104} for the regression task (PPG-Da Li A) and found that the best performing parameter was 104. For the classification tasks, we performed a grid search over the values {100, 104, 108, 1012} and found that the best-performing scaling was 104 based on the averaged score.