Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Memory-Enhanced Neural Solvers for Routing Problems

Authors: Felix Chalumeau, Refiloe Shabe, Noah De Nicola, Arnu Pretorius, Tom Barrett, Nathan Grinsztajn

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate its effectiveness on the Traveling Salesman and Capacitated Vehicle Routing problems, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing that it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and verify MEMENTO s scalability and data-efficiency: pushing the state-of-the-art on 11 out of 12 evaluated tasks.
Researcher Affiliation	Collaboration	Felix Chalumeau 1 Refiloe Shabe1 Noah De Nicola 2 Arnu Pretorius1 Thomas D. Barrett 1 Nathan Grinsztajn 1 1Insta Deep 2University of Cape Town
Pseudocode	Yes	The details of the MEMENTO training procedure are presented in Algorithm 1 and can be understood as follows.
Open Source Code	Yes	Code availability We provide access to the code2 utilized for training our method and executing all baseline models. We release our checkpoints for all problem types and scales, accompanied by the necessary datasets to replicate our findings. We implement our method and experiments in JAX (Bradbury et al., 2018), along with test sets and checkpoints. Footnote 2: Code, checkpoints, and evaluation sets available at https://github.com/instadeepai/memento
Open Datasets	Yes	We use datasets of 10,000 instances with 100 cities/customer nodes drawn from the training distribution, and three generalization datasets of 1,000 instances of sizes 125, 150, and 200, all from benchmark sets frequently used in the literature (Kool et al., 2019; Kwon et al., 2020; Hottung et al., 2022; Grinsztajn et al., 2023; Chalumeau et al., 2023b). For TSP, we use the dataset from Fu et al. (2021). For CVRP, we use the dataset from Luo et al. (2023).
Dataset Splits	Yes	These instances feature the positions of 100 cities/customers uniformly sampled within the unit square. The benchmark also includes three datasets of distributions not encountered during training, each comprising 1000 problem instances with larger sizes: 125, 150, and 200, generated from a uniform distribution across the unit square. We compare MEMENTO and EAS on a set of 128 unseen instances (of size 500)
Hardware Specification	Yes	We use TPU v3-8 for our experiments. We thank Google s TPU Research Cloud (TRC) for supporting our research with Cloud TPUs.
Software Dependencies	No	We implement our method and experiments in JAX (Bradbury et al., 2018). The two problems are also JAX implementations from Jumanji (Bonnet et al., 2023). CMA-ES implementation to mix MEMENTO and COMPASS is taken from the research package QDax (Chalumeau et al., 2023a). Neural networks, optimizers, and many utilities are implemented using the Deep Mind JAX ecosystem (Babuschkin et al., 2020).
Experiment Setup	Yes	All hyperparameters can be found in Appendix C. We report all the hyper-parameters used during train and inference time. For our method MEMENTO, there is no training hyper-parameters to report for instance sizes 125, 150, and 200 as the model used was trained on instances of size 100. The hyper-parameters used for MEMENTO are reported in Table 10.