reproducibilityindex.ai

World Model as a Graph: Learning Latent Landmarks for Planning

Authors: Lunjun Zhang, Ge Yang, Bradly C Stadie

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments and Evaluation We investigate the impact of L3P in a variety of robotic manipulation and navigation environments. These include standard benchmarks such as Fetch-Pick And Place, and more difﬁcult environments such as Ant Maze-Hard and Place-Inside-Box that have been engineered to require testtime generalization.
Researcher Affiliation	Academia	1University of Toronto 2Vector Institute 3MIT 4Toyota Technological Institute at Chicago.
Pseudocode	Yes	Algorithm 1 Online Planning in L3P
Open Source Code	Yes	Code for L3P is available at: https://github.com/LunjunZhang/world-model-as-a-graph.
Open Datasets	Yes	We investigate the impact of L3P in a variety of robotic manipulation and navigation environments. These include standard benchmarks such as Fetch-Pick And Place (Plappert et al., 2018; Andrychowicz et al., 2017), and more difﬁcult environments such as Ant Maze-Hard and Place-Inside-Box that have been engineered to require test-time generalization.
Dataset Splits	No	The paper discusses 'training' and 'test time' scenarios (e.g., 'During training, the goal distribution...', 'At test time, we always initialize the agent...'), and references environments as 'training' or 'test' environments, but it does not explicitly mention or specify a 'validation' dataset split.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components and algorithms used, such as DDPG and HER, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We use a short, 200-timestep time horizon during training and a ρ0 that is uniform in the maze. At test time, we always initialize the agent on one end of the maze, and set the goal on the other end. The horizon of the test environment is 500 steps.