DHER: Hindsight Experience Replay for Dynamic Goals
Authors: Meng Fang, Cheng Zhou, Bei Shi, Boqing Gong, Jia Xu, Tong Zhang
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DHER on tasks of robotic manipulation and moving object tracking, and transfer the polices from simulation to physical robots. Extensive comparison and ablation studies demonstrate the superiority of our approach, showing that DHER is a crucial ingredient to enable RL to solve tasks with dynamic goals in manipulation and grid world domains. |
| Researcher Affiliation | Industry | Meng Fang , Cheng Zhou, Bei Shi, Boqing Gong, Jia Xu, Tong Zhang Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1 Dynamic Hindsight Experience Replay with Experience Assembling |
| Open Source Code | Yes | 1Our code and environments are available at https://github.com/mengf1/DHER. |
| Open Datasets | No | The paper describes modifying existing environments and creating new ones, but does not provide concrete access information (link, DOI, specific citation with author/year) for any publicly available dataset used for training. |
| Dataset Splits | No | The paper does not specify exact percentages, sample counts, or citations to predefined splits for training, validation, or test sets. |
| Hardware Specification | Yes | We use the Universal Robots UR10 with a gripper. We use a Real Sense Camera SR300 to track the position of objects. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co physics engine' but does not provide a specific version number for it or any other software dependency. |
| Experiment Setup | Yes | Goals are positions in the 3D world coordinate system with a fixed tolerance (we use ϵ = 0.01 for the tolerance). The velocity we use is v = 0.011. Rewards are binary and sparse: r(st, at, gt) = 1condition(|sobj t+1 gt+1| ϵ) |