Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Policy Continuation with Hindsight Inverse Dynamics
Authors: Hao Sun, Zhizhong Li, Xiaotong Liu, Bolei Zhou, Dahua Lin
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... 4 Experiments ... Our empirical results are shown in Fig.3. |
| Researcher Affiliation | Academia | 1The Chinese University of Hong Kong, 2Peking University |
| Pseudocode | Yes | Algorithm 1 Policy Continuation with Hindsight Inverse Dynamics (PCHID) |
| Open Source Code | Yes | Code and related materials are available at https://sites.google.com/view/neurips2019pchid |
| Open Datasets | Yes | On two multi-goal tasks Grid World and Fetch Reach, PCHID significantly improves the sample efficiency as well as the final performance. ... The Fetch environments... provided by Plappert et al. [3]. ... We use the Grid World navigation task in Value Iteration Networks (VIN) [31] |
| Dataset Splits | No | No explicit mention of training/validation/test dataset splits with percentages or sample counts. The paper mentions training for 500 episodes and testing on unseen maps. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or memory) used for experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper. |
| Experiment Setup | Yes | We train our agent for 500 episodes in total so that the agent needs to learn to navigate within just 500 trials... A reward of 10 will be provided if the agent reaches the goal within 50 timesteps, otherwise the agent will receive a reward of 0.02. ... Action is a continuous 4-dimentional vector with the first three of them controlling movement of the gripper and the last one controlling opening and closing of the gripper. ... The agent will get a reward of 0 if the object is at the target location within a tolerance or -1 otherwise. |