Learning World Models for Unconstrained Goal Navigation

Authors: Yuanlin Duan, Wensen Mao, He Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that MUN strengthens the reliability of world models and significantly improves the policy s capacity to generalize across new goal settings.
Researcher Affiliation Academia Yuanlin Duan Rutgers University yuanlin.duan@rutgers.edu Wensen Mao Rutgers University wm300@cs.rutgers.edu He Zhu Rutgers University hz375@cs.rutgers.edu
Pseudocode Yes Algorithm 1 The main training framework of MUN
Open Source Code Yes The code for MUN is available on https://github.com/RU-Automated-Reasoning-Group/MUN.
Open Datasets Yes We conducted experiments on six challenging goal-conditioned tasks to evaluate MUN. In Ant-Maze, Walker, 3-Block Stacking, Block Rotation, Pen Rotation, Fetch Slide. ... We use the 'Fetch Slide-v1' environment from Gymnasium, where the robot operates in a 25-dimensional state space that includes the robot s joint states, object position, and goal information.
Dataset Splits No The paper does not explicitly specify a standard training/validation/test split for the entire dataset used for training models in the main experimental setup. While a 'validation dataset' is mentioned for assessing world models in Appendix F.3, it is not described as a general split for hyperparameter tuning or early stopping during policy training across all experiments.
Hardware Specification Yes We conduct each experiment on GPU Nvidia A100 and require about 3GB of GPU memory.
Software Dependencies No The paper mentions using 'the default hyperparameters of the LEXA backbone MBRL agent' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We use the default hyperparameters of the LEXA backbone MBRL agent (e.g., learning rate, optimizer, network architecture) and keep them consistent across all baselines. MUN primarily requires hyperparameter tuning in the following: 1) the number of candidate subgoals stored Nsubgoals; 2) the number of subgoals used for navigation when sampling in the environment Ns; and 3) the total episode length L and the maximum number of timesteps allocated for navigating to a specific subgoal Ts. We show these hyperparameters in Table 2.