Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flow Equivariant Recurrent Neural Networks

Authors: Andy Keller

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following, we will review the principle of equivariance in neural network architectures ( 2), introduce the notion of Flow Equivariance ( 3), and demonstrate that existing RNNs are indeed not flow equivariant as we would desire. We will then introduce Flow Equivariant Recurrent Neural Networks (FERNNs) in 4; and demonstrate empirically that FERNNs achieve zero-shot generalization to new flows at test time, improved length generalization, and improved performance on datasets which possess strong flow symmetries in 5. In our discussion, ( 6), we briefly outline related work on equivariance with respect to motion, and highlight the differences with our proposed approach, but leave a thorough review of related work to D. We conclude with the limitations of our proposed framework and promising future directions. We provide an accompanying blog post1 with additional visualizations. 5 Experiments In the following, we introduce datasets with known flow symmetries and study how adding flow equivariance impacts performance compared with non-equivariant baselines. We investigate two sequence datasets: (i) next-step prediction on a modified Flowing MNIST dataset with 2 simultaneous digits undergoing imposed translation and rotation flows [Le Cun et al., 1998], and (ii) sequence classification on the KTH human action recognition dataset [Schuldt et al., 2004], augmented with additional translation flows to simulate camera motion. In B we include the full details of the dataset creation, model architectures, training, and evaluation procedures; and in E we include extended results. Code is available at: https://github.com/akandykeller/FERNN.
Researcher Affiliation	Academia	T. Anderson Keller The Kempner Instutite for the Study of Natural and Artificial Intelligence Harvard University, Cambridge, MA 15213 EMAIL
Pseudocode	No	The paper describes methods mathematically and textually but does not include any explicit sections or figures labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code is available at: https://github.com/akandykeller/FERNN.
Open Datasets	Yes	We investigate two sequence datasets: (i) next-step prediction on a modified Flowing MNIST dataset with 2 simultaneous digits undergoing imposed translation and rotation flows [Le Cun et al., 1998], and (ii) sequence classification on the KTH human action recognition dataset [Schuldt et al., 2004], augmented with additional translation flows to simulate camera motion. ... KTH action recognition dataset [Schuldt et al., 2004], obtained from http://www.csc.kth.se/cvap/actions/.
Dataset Splits	Yes	We construct sequences from the Flowing MNIST dataset by applying a flow generators ν randomly picked from an admissible set Vtrain, Vval, & Vtest to samples from the corresponding train / validation / test split of the original MNIST dataset [Le Cun et al., 1998]... The training, validation, and test sets are constructed from this dataset by taking the videos from the first 16 people as training, the next 4 people as validation, and the last 5 people as test.
Hardware Specification	Yes	All experiments in this paper were performed on a private cluster containing a mixture of NVIDIA A100 and H100 GPUs, each having 40GB and 80GB of VRAM respectively.
Software Dependencies	No	We use the escnn library to implement the SE(2) convolutions [Cesa et al., 2022]. We additionally zero-pad all images with 6-pixels on each side (resulting in images of size (40x40)) to allow for the rotation to fit within the full image frame. ... In practice, to implement the spatial rotation we use the Pytorch function F.grid_sample with zero-padding and bilinear interpolation.
Experiment Setup	Yes	All models are trained for 50 epochs, with a learning rate of 1e-4 using the Adam optimizer Kingma and Ba [2017]. For translation flows, we use a batch size of 128, and clip gradient magnitudes at 1 in all models for additional stability. For rotation flows we use a batch size of 32 due to memory constraints, and find gradient clipping not necessary. ... All models are trained for 500 epochs, with a batch size of 32. We search over learning rates in the set {3e-3, 1e-3, 3e-4, 1e-4}, for each model, running three random initialization seeds for each.