Policy Continuation with Hindsight Inverse Dynamics

Authors: Hao Sun, Zhizhong Li, Xiaotong Liu, Bolei Zhou, Dahua Lin

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... 4 Experiments ... Our empirical results are shown in Fig.3.
Researcher Affiliation Academia 1The Chinese University of Hong Kong, 2Peking University
Pseudocode Yes Algorithm 1 Policy Continuation with Hindsight Inverse Dynamics (PCHID)
Open Source Code Yes Code and related materials are available at https://sites.google.com/view/neurips2019pchid
Open Datasets Yes On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... The Fetch environments... provided by Plappert et al. [3]. ... We use the Grid World navigation task in Value Iteration Networks (VIN) [31]
Dataset Splits No No explicit mention of training/validation/test dataset splits with percentages or sample counts. The paper mentions training for 500 episodes and testing on unseen maps.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, or memory) used for experiments are mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper.
Experiment Setup Yes We train our agent for 500 episodes in total so that the agent needs to learn to navigate within just 500 trials... A reward of 10 will be provided if the agent reaches the goal within 50 timesteps, otherwise the agent will receive a reward of 0.02. ... Action is a continuous 4-dimentional vector with the first three of them controlling movement of the gripper and the last one controlling opening and closing of the gripper. ... The agent will get a reward of 0 if the object is at the target location within a tolerance or -1 otherwise.