World Model as a Graph: Learning Latent Landmarks for Planning
Authors: Lunjun Zhang, Ge Yang, Bradly C Stadie
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments and Evaluation We investigate the impact of L3P in a variety of robotic manipulation and navigation environments. These include standard benchmarks such as Fetch-Pick And Place, and more difficult environments such as Ant Maze-Hard and Place-Inside-Box that have been engineered to require testtime generalization. |
| Researcher Affiliation | Academia | 1University of Toronto 2Vector Institute 3MIT 4Toyota Technological Institute at Chicago. |
| Pseudocode | Yes | Algorithm 1 Online Planning in L3P |
| Open Source Code | Yes | Code for L3P is available at: https://github.com/LunjunZhang/world-model-as-a-graph. |
| Open Datasets | Yes | We investigate the impact of L3P in a variety of robotic manipulation and navigation environments. These include standard benchmarks such as Fetch-Pick And Place (Plappert et al., 2018; Andrychowicz et al., 2017), and more difficult environments such as Ant Maze-Hard and Place-Inside-Box that have been engineered to require test-time generalization. |
| Dataset Splits | No | The paper discusses 'training' and 'test time' scenarios (e.g., 'During training, the goal distribution...', 'At test time, we always initialize the agent...'), and references environments as 'training' or 'test' environments, but it does not explicitly mention or specify a 'validation' dataset split. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and algorithms used, such as DDPG and HER, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use a short, 200-timestep time horizon during training and a ρ0 that is uniform in the maze. At test time, we always initialize the agent on one end of the maze, and set the goal on the other end. The horizon of the test environment is 500 steps. |