Learning World Models for Unconstrained Goal Navigation
Authors: Yuanlin Duan, Wensen Mao, He Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that MUN strengthens the reliability of world models and significantly improves the policy s capacity to generalize across new goal settings. |
| Researcher Affiliation | Academia | Yuanlin Duan Rutgers University yuanlin.duan@rutgers.edu Wensen Mao Rutgers University wm300@cs.rutgers.edu He Zhu Rutgers University hz375@cs.rutgers.edu |
| Pseudocode | Yes | Algorithm 1 The main training framework of MUN |
| Open Source Code | Yes | The code for MUN is available on https://github.com/RU-Automated-Reasoning-Group/MUN. |
| Open Datasets | Yes | We conducted experiments on six challenging goal-conditioned tasks to evaluate MUN. In Ant-Maze, Walker, 3-Block Stacking, Block Rotation, Pen Rotation, Fetch Slide. ... We use the 'Fetch Slide-v1' environment from Gymnasium, where the robot operates in a 25-dimensional state space that includes the robot s joint states, object position, and goal information. |
| Dataset Splits | No | The paper does not explicitly specify a standard training/validation/test split for the entire dataset used for training models in the main experimental setup. While a 'validation dataset' is mentioned for assessing world models in Appendix F.3, it is not described as a general split for hyperparameter tuning or early stopping during policy training across all experiments. |
| Hardware Specification | Yes | We conduct each experiment on GPU Nvidia A100 and require about 3GB of GPU memory. |
| Software Dependencies | No | The paper mentions using 'the default hyperparameters of the LEXA backbone MBRL agent' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We use the default hyperparameters of the LEXA backbone MBRL agent (e.g., learning rate, optimizer, network architecture) and keep them consistent across all baselines. MUN primarily requires hyperparameter tuning in the following: 1) the number of candidate subgoals stored Nsubgoals; 2) the number of subgoals used for navigation when sampling in the environment Ns; and 3) the total episode length L and the maximum number of timesteps allocated for navigating to a specific subgoal Ts. We show these hyperparameters in Table 2. |