reproducibilityindex.ai

Policy Continuation with Hindsight Inverse Dynamics

Authors: Hao Sun, Zhizhong Li, Xiaotong Liu, Bolei Zhou, Dahua Lin

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... 4 Experiments ... Our empirical results are shown in Fig.3.
Researcher Affiliation	Academia	1The Chinese University of Hong Kong, 2Peking University
Pseudocode	Yes	Algorithm 1 Policy Continuation with Hindsight Inverse Dynamics (PCHID)
Open Source Code	Yes	Code and related materials are available at https://sites.google.com/view/neurips2019pchid
Open Datasets	Yes	On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... The Fetch environments... provided by Plappert et al. [3]. ... We use the Grid World navigation task in Value Iteration Networks (VIN) [31]
Dataset Splits	No	No explicit mention of training/validation/test dataset splits with percentages or sample counts. The paper mentions training for 500 episodes and testing on unseen maps.
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, or memory) used for experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper.
Experiment Setup	Yes	We train our agent for 500 episodes in total so that the agent needs to learn to navigate within just 500 trials... A reward of 10 will be provided if the agent reaches the goal within 50 timesteps, otherwise the agent will receive a reward of 0.02. ... Action is a continuous 4-dimentional vector with the first three of them controlling movement of the gripper and the last one controlling opening and closing of the gripper. ... The agent will get a reward of 0 if the object is at the target location within a tolerance or -1 otherwise.