Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Authors: Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, Rahaf Aljundi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental results highlighting different aspects of our proposed method. We first describe the benchmark tasks and baselines used in our experiments, followed by different experimental insights in the following subsections.
Researcher Affiliation Collaboration Gunshi Gupta University of Oxford Karmesh Yadav Georgia Tech University Zsolt Kira Georgia Tech University Yarin Gal University of Oxford Rahaf Aljundi Toyota Motor Europe
Pseudocode Yes The encoding and integration of summary tokens into future segments are illustrated in Figure 1, and the pseudocoe is provided in Appendix A.4. [...] Algorithm 1 Memo [...] Algorithm 2 Train Memo
Open Source Code Yes We plan to fully open-source our implementation upon publication and include an early release version at https://github.com/Memory-icrl/memo. [In the NeurIPS Paper Checklist, Question 4, Justification]: Provided in the Appendix. We also release our code.
Open Datasets Yes EXTOBJNAV: The Extended Object Navigation (EXTOBJNAV) task, first introduced in [11], builds on the OBJECTNAV task commonly used in embodied AI research [1, 20]. The EXTOBJNAV task uses 37 training and 12 validation scenes from HSSD [16] and includes 20 object instances from the YCB dataset [6]. [...] Dark-Key-To-Door: Dark-Key-To-Door [18] is a Meta-RL [2] benchmark
Dataset Splits Yes The EXTOBJNAV task uses 37 training and 12 validation scenes from HSSD [16] and includes 20 object instances from the YCB dataset [6]. These objects can be randomly placed on receptacles throughout the scene, with each placement featuring an average of 30 objects. We use 11,100 novel placements during training and 108 during evaluation. [...] Dark-Key-To-Door: We evaluate performance by average reward across 960 trials and 3 validation seeds.
Hardware Specification Yes Hardware 16x NVIDIA A40 GPUs
Software Dependencies No The paper mentions several algorithms and optimizers with citations (e.g., DDPPO [30], Adam [17]), and components like Ge LU [26] and Ro PE [28], but does not specify software library versions for general programming languages or frameworks like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 2: Comprehensive training setup and hyperparameters related to the EXTOBJNAV task. Model Architecture: # Layers 4, # Heads 8, Hidden dimensions 256, MLP Hidden dimensions 1024, Activation Ge LU [26]. Training Setup: Workers 320, Batch Size 160, RL Algorithm DDPPO [30], Discount Factor (γ) 0.99, GAE Parameter (τ) 0.95, Entropy Coefficient 0.1, Value Loss Coefficient 0.5. Optimization: Optimizer Adam [17], Regularization Depth dropout [15] with value 0.1, Learning Rate Schedule Warm-up for 100K env interactions, Initial Learning Rate 4e-7, Learning Rate at Warm-up End 4e-4, Decay Schedule Cosine decay [19] to 0 after 1B steps. Computation Precision FP16 for visual encoder, FP32 for other model components, Rollout size 4096, Total # updates per rollout 16, # partial updates 15, # full updates 1.