reproducibilityindex.ai

Mapping State Space using Landmarks for Universal Goal Reaching

Authors: Zhiao Huang, Fangchen Liu, Hao Su

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks.
Researcher Affiliation	Academia	Zhiao Huang UC San Diego z2huang@eng.ucsd.edu Fangchen Liu UC San Diego fliu@eng.ucsd.edu Hao Su UC San Diego haosu@eng.ucsd.edu
Pseudocode	Yes	Algorithm 1: Planning with State-space Mapping (Planner)
Open Source Code	No	The paper does not include any explicit statement about releasing open-source code for the methodology or a link to a code repository.
Open Datasets	Yes	Example universal goal reaching environments include labyrinth walking (e.g., Ant Maze [31]) and robot arm control (e.g., Fetch Reach [32]).
Dataset Splits	No	The paper mentions training and testing scenarios ('For training, the agent is born at a random position to reach a random goal in the maze. For testing, the agent should reach the other side of the U-Maze within 500 steps.') but does not specify explicit training/validation/test dataset splits, percentages, or validation procedures.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'DQN', 'HER', 'DDPG', 'Mu Jo Co', and 'Open AI gym' with citations, but does not specify their version numbers or other software dependencies with specific versioning.
Experiment Setup	Yes	There are two main hyper-parameters for the planner the number of landmarks and the edge clipping threshold τ. Figure 6a shows the evaluation result of the model trained after 0.8M steps in Ant Maze. We see that our method is generally robust under different choices of hyper-parameters. Here τ is the negative distance between landmarks.