Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ENMA: Tokenwise Autoregression for Continuous Neural PDE Operators

Authors: Armand Kassaï Koupaï, Lise Le Boudec, Louis Serrano, Patrick Gallinari

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to evaluate ENMA. Section 4.2 assesses the encoder decoder in terms of reconstruction error, time-stepping accuracy, and compression rate, comparing against standard neural operator baselines. Section 4.3 evaluates ENMA s generative forecasting ability for both temporal conditioning and Initial Value Problem with context trajectory.
Researcher Affiliation	Collaboration	1 Sorbonne Université, CNRS, ISIR, 75005 Paris, France 2 Criteo AI Lab, Paris, France
Pseudocode	Yes	We present the full inference method of ENMA for latent generation in the pseudo-code 1: Algorithm 1: ENMA Inference: Autoregressive Latent Generation with Cosine Masked Decoding
Open Source Code	No	We will release the code upon acceptance for reproducibility. In the mean time, we precisely detail all training and inference details in the appendices.
Open Datasets	Yes	We evaluate ENMA on standard public benchmarks (Rayleigh Bénard and Active Matter, (Ohana et al., 2024)).
Dataset Splits	Yes	For each system, we generate 12,000 training and 1,200 test trajectories, using a batch size of 10. For evaluation, we generate two test sets: 1,200 trajectories for in-distribution (In-D) and 120 for out-of-distribution (Out-D) evaluation.
Hardware Specification	Yes	All experiments were conducted on a A100.
Software Dependencies	No	The code has been written in Pytorch (Paszke et al., 2019).
Experiment Setup	Yes	Optimizer and Learning Rate Schedule We use the Adam W optimizer with β1 = 0.9 and β2 = 0.95 for all experiments. The learning rate follows a cosine decay schedule, starting from an initial value of 10 3 and annealing to 10 5 over the course of training. To stabilize the early training phase, we apply a linear warmup over the first 500 optimization steps.