Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mechanistic Interpretability of RNNs emulating Hidden Markov Models

Authors: Elia Torre, Michele Viscione, Lucas Pompe, Benjamin F. Grewe, Valerio Mante

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we first show that RNNs can replicate HMM emission statistics and then reverse-engineer the trained networks to uncover the mechanisms they implement. In the absence of inputs, the activity of trained RNNs collapses towards a single fixed point. When driven by stochastic input, trajectories instead exhibit noise-sustained dynamics along closed orbits. Rotation along these orbits modulates the emission probabilities and is governed by transitions between regions of slow, noise-driven dynamics connected by fast, deterministic transitions. The trained RNNs develop highly structured connectivity, with a small set of kick neurons initiating transitions between these regions. This mechanism emerges during training as the network shifts into a regime of stochastic resonance, enabling it to perform probabilistic computations. Analyses across multiple HMM architectures fully connected, cyclic, and linear-chain reveal that this solution generalizes through the modular reuse of the same dynamical motif, suggesting a compositional principle by which RNNs can emulate complex discrete latent dynamics.
Researcher Affiliation Academia Institute of Neuroinformatics, University of Zurich & ETH Zurich
Pseudocode No The paper describes the network architecture mathematically in Equation (2) and outlines the training process, but it does not present any structured pseudocode or algorithm blocks. For example, the hidden state update is given as: ht = Re LU(ht 1W T hh + xt W T ih), yt = ht AT. (2)
Open Source Code Yes Code available at https://github.com/Elia Torre/hmmrnn.
Open Datasets No Target sequences Y are generated by linear-chain, fully-connected, and cyclic HMMs (described in Section 3.1 and Appendix A); the example shows a 2-state linear-chain model. ... To understand how RNNs encode discrete, probabilistic structure in their state spaces, we trained them to replicate the outputs of HMMs, which implement a process that is discrete and probabilistic by construction (Fig. 1).
Dataset Splits No Each RNN is trained on 30,000 sequences of fixed length, sampled from its corresponding HMM: 100 for M = 2, 30 for M = 3, 4, 40 for M = 5, 30 for the fully-connected, and 40 for the cyclic architectures.
Hardware Specification Yes On an NVIDIA RTX 4090 GPU, each model completed training in approximately 5 20 minutes, depending on sequence length and network size.
Software Dependencies No The paper mentions using the Adam optimizer [23] but does not specify its version. It also mentions other methods like vanilla RNNs [12] and Gumbel-Softmax reparametrization trick [21] [27] but these are not specific software libraries with version numbers. Optimization is performed in batches of 4096 using the Adam optimizer [23] with a learning rate of 0.001.
Experiment Setup Yes We employ standard, vanilla RNNs [12] of hidden-state size |h| {50, 150, 200}. At each time-step, the network receives Gaussian input xt N(0, Id), with d {1, 10, 100, 200}. The hidden state is updated and projects onto the three logits via: ht = Re LU(ht 1W T hh + xt W T ih), yt = ht AT. (2) ... temperature τ (set to 1 in all experiments) ... Optimization is performed in batches of 4096 using the Adam optimizer [23] with a learning rate of 0.001. Hidden states are initialized to zero at the start of training, and all weights are drawn from a uniform distribution U(1/√k), where k = hidden_size. To stabilize learning and mitigate exploding gradients, we apply gradient clipping with a maximum norm of 0.9 for the linear-chain models and 0.3 for the fully-connected and cyclic architectures. Training proceeded until convergence, typically reached within 1000 epochs.