Policy Continuation with Hindsight Inverse Dynamics
Authors: Hao Sun, Zhizhong Li, Xiaotong Liu, Bolei Zhou, Dahua Lin
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... 4 Experiments ... Our empirical results are shown in Fig.3. |
| Researcher Affiliation | Academia | 1The Chinese University of Hong Kong, 2Peking University |
| Pseudocode | Yes | Algorithm 1 Policy Continuation with Hindsight Inverse Dynamics (PCHID) |
| Open Source Code | Yes | Code and related materials are available at https://sites.google.com/view/neurips2019pchid |
| Open Datasets | Yes | On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... The Fetch environments... provided by Plappert et al. [3]. ... We use the Grid World navigation task in Value Iteration Networks (VIN) [31] |
| Dataset Splits | No | No explicit mention of training/validation/test dataset splits with percentages or sample counts. The paper mentions training for 500 episodes and testing on unseen maps. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or memory) used for experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper. |
| Experiment Setup | Yes | We train our agent for 500 episodes in total so that the agent needs to learn to navigate within just 500 trials... A reward of 10 will be provided if the agent reaches the goal within 50 timesteps, otherwise the agent will receive a reward of 0.02. ... Action is a continuous 4-dimentional vector with the first three of them controlling movement of the gripper and the last one controlling opening and closing of the gripper. ... The agent will get a reward of 0 if the object is at the target location within a tolerance or -1 otherwise. |