reproducibilityindex.ai

Learning World Models for Unconstrained Goal Navigation

Authors: Yuanlin Duan, Wensen Mao, He Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that MUN strengthens the reliability of world models and significantly improves the policy s capacity to generalize across new goal settings.
Researcher Affiliation	Academia	Yuanlin Duan Rutgers University yuanlin.duan@rutgers.edu Wensen Mao Rutgers University wm300@cs.rutgers.edu He Zhu Rutgers University hz375@cs.rutgers.edu
Pseudocode	Yes	Algorithm 1 The main training framework of MUN
Open Source Code	Yes	The code for MUN is available on https://github.com/RU-Automated-Reasoning-Group/MUN.
Open Datasets	Yes	We conducted experiments on six challenging goal-conditioned tasks to evaluate MUN. In Ant-Maze, Walker, 3-Block Stacking, Block Rotation, Pen Rotation, Fetch Slide. ... We use the 'Fetch Slide-v1' environment from Gymnasium, where the robot operates in a 25-dimensional state space that includes the robot s joint states, object position, and goal information.
Dataset Splits	No	The paper does not explicitly specify a standard training/validation/test split for the entire dataset used for training models in the main experimental setup. While a 'validation dataset' is mentioned for assessing world models in Appendix F.3, it is not described as a general split for hyperparameter tuning or early stopping during policy training across all experiments.
Hardware Specification	Yes	We conduct each experiment on GPU Nvidia A100 and require about 3GB of GPU memory.
Software Dependencies	No	The paper mentions using 'the default hyperparameters of the LEXA backbone MBRL agent' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	We use the default hyperparameters of the LEXA backbone MBRL agent (e.g., learning rate, optimizer, network architecture) and keep them consistent across all baselines. MUN primarily requires hyperparameter tuning in the following: 1) the number of candidate subgoals stored Nsubgoals; 2) the number of subgoals used for navigation when sampling in the environment Ns; and 3) the total episode length L and the maximum number of timesteps allocated for navigating to a specific subgoal Ts. We show these hyperparameters in Table 2.