World Model as a Graph: Learning Latent Landmarks for Planning

Authors: Lunjun Zhang, Ge Yang, Bradly C Stadie

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments and Evaluation We investigate the impact of L3P in a variety of robotic manipulation and navigation environments. These include standard benchmarks such as Fetch-Pick And Place, and more difficult environments such as Ant Maze-Hard and Place-Inside-Box that have been engineered to require testtime generalization.
Researcher Affiliation Academia 1University of Toronto 2Vector Institute 3MIT 4Toyota Technological Institute at Chicago.
Pseudocode Yes Algorithm 1 Online Planning in L3P
Open Source Code Yes Code for L3P is available at: https://github.com/LunjunZhang/world-model-as-a-graph.
Open Datasets Yes We investigate the impact of L3P in a variety of robotic manipulation and navigation environments. These include standard benchmarks such as Fetch-Pick And Place (Plappert et al., 2018; Andrychowicz et al., 2017), and more difficult environments such as Ant Maze-Hard and Place-Inside-Box that have been engineered to require test-time generalization.
Dataset Splits No The paper discusses 'training' and 'test time' scenarios (e.g., 'During training, the goal distribution...', 'At test time, we always initialize the agent...'), and references environments as 'training' or 'test' environments, but it does not explicitly mention or specify a 'validation' dataset split.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions software components and algorithms used, such as DDPG and HER, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We use a short, 200-timestep time horizon during training and a ρ0 that is uniform in the maze. At test time, we always initialize the agent on one end of the maze, and set the goal on the other end. The horizon of the test environment is 500 steps.