Mapping State Space using Landmarks for Universal Goal Reaching
Authors: Zhiao Huang, Fangchen Liu, Hao Su
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks. |
| Researcher Affiliation | Academia | Zhiao Huang UC San Diego z2huang@eng.ucsd.edu Fangchen Liu UC San Diego fliu@eng.ucsd.edu Hao Su UC San Diego haosu@eng.ucsd.edu |
| Pseudocode | Yes | Algorithm 1: Planning with State-space Mapping (Planner) |
| Open Source Code | No | The paper does not include any explicit statement about releasing open-source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | Example universal goal reaching environments include labyrinth walking (e.g., Ant Maze [31]) and robot arm control (e.g., Fetch Reach [32]). |
| Dataset Splits | No | The paper mentions training and testing scenarios ('For training, the agent is born at a random position to reach a random goal in the maze. For testing, the agent should reach the other side of the U-Maze within 500 steps.') but does not specify explicit training/validation/test dataset splits, percentages, or validation procedures. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'DQN', 'HER', 'DDPG', 'Mu Jo Co', and 'Open AI gym' with citations, but does not specify their version numbers or other software dependencies with specific versioning. |
| Experiment Setup | Yes | There are two main hyper-parameters for the planner the number of landmarks and the edge clipping threshold τ. Figure 6a shows the evaluation result of the model trained after 0.8M steps in Ant Maze. We see that our method is generally robust under different choices of hyper-parameters. Here τ is the negative distance between landmarks. |