Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Authors: Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, Rahaf Aljundi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experimental results highlighting different aspects of our proposed method. We first describe the benchmark tasks and baselines used in our experiments, followed by different experimental insights in the following subsections.
Researcher Affiliation	Collaboration	Gunshi Gupta University of Oxford Karmesh Yadav Georgia Tech University Zsolt Kira Georgia Tech University Yarin Gal University of Oxford Rahaf Aljundi Toyota Motor Europe
Pseudocode	Yes	The encoding and integration of summary tokens into future segments are illustrated in Figure 1, and the pseudocoe is provided in Appendix A.4. [...] Algorithm 1 Memo [...] Algorithm 2 Train Memo
Open Source Code	Yes	We plan to fully open-source our implementation upon publication and include an early release version at https://github.com/Memory-icrl/memo. [In the NeurIPS Paper Checklist, Question 4, Justification]: Provided in the Appendix. We also release our code.
Open Datasets	Yes	EXTOBJNAV: The Extended Object Navigation (EXTOBJNAV) task, first introduced in [11], builds on the OBJECTNAV task commonly used in embodied AI research [1, 20]. The EXTOBJNAV task uses 37 training and 12 validation scenes from HSSD [16] and includes 20 object instances from the YCB dataset [6]. [...] Dark-Key-To-Door: Dark-Key-To-Door [18] is a Meta-RL [2] benchmark
Dataset Splits	Yes	The EXTOBJNAV task uses 37 training and 12 validation scenes from HSSD [16] and includes 20 object instances from the YCB dataset [6]. These objects can be randomly placed on receptacles throughout the scene, with each placement featuring an average of 30 objects. We use 11,100 novel placements during training and 108 during evaluation. [...] Dark-Key-To-Door: We evaluate performance by average reward across 960 trials and 3 validation seeds.
Hardware Specification	Yes	Hardware 16x NVIDIA A40 GPUs
Software Dependencies	No	The paper mentions several algorithms and optimizers with citations (e.g., DDPPO [30], Adam [17]), and components like Ge LU [26] and Ro PE [28], but does not specify software library versions for general programming languages or frameworks like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Table 2: Comprehensive training setup and hyperparameters related to the EXTOBJNAV task. Model Architecture: # Layers 4, # Heads 8, Hidden dimensions 256, MLP Hidden dimensions 1024, Activation Ge LU [26]. Training Setup: Workers 320, Batch Size 160, RL Algorithm DDPPO [30], Discount Factor (γ) 0.99, GAE Parameter (τ) 0.95, Entropy Coefficient 0.1, Value Loss Coefficient 0.5. Optimization: Optimizer Adam [17], Regularization Depth dropout [15] with value 0.1, Learning Rate Schedule Warm-up for 100K env interactions, Initial Learning Rate 4e-7, Learning Rate at Warm-up End 4e-4, Decay Schedule Cosine decay [19] to 0 after 1B steps. Computation Precision FP16 for visual encoder, FP32 for other model components, Rollout size 4096, Total # updates per rollout 16, # partial updates 15, # full updates 1.