reproducibilityindex.ai

Policy Optimization with Demonstrations

Authors: Bingyi Kang, Zequn Jie, Jiashi Feng

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that POf D induces implicit dynamic reward shaping and brings provable beneﬁts for policy improvement. Furthermore, it can be combined with policy gradient methods to produce state-of-the-art results, as demonstrated experimentally on a range of popular benchmark sparse-reward tasks, even when the demonstrations are few and imperfect.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, National University of Singapore, Singapore 2Tencent AI Lab, China.
Pseudocode	Yes	Algorithm 1 Policy optimization with demonstrations
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for its methodology.
Open Datasets	Yes	To comprehensively assess our method, we conduct extensive experiments on eight widely used physical control tasks, ranging from low-dimensional ones such as cartpole (Barto et al., 1983) and mountain car (Moore, 1990) to high-dimensional and naturally sparse environments based on Open AI Gym (Brockman et al., 2016) and Mujoco (Todorov et al., 2012).
Dataset Splits	No	The paper mentions using
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	Implementation Details Due to space limit, we defer implementation details to the supplementary material.
Experiment Setup	No	Implementation Details Due to space limit, we defer implementation details to the supplementary material.