reproducibilityindex.ai

Guarded Policy Optimization with Imperfect Online Demonstrations

Authors: Zhenghai Xue, Zhenghao Peng, Quanyi Li, Zhihan Liu, Bolei Zhou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various continuous control tasks show that our method can exploit teacher policies at different performance levels while maintaining a low training cost.
Researcher Affiliation	Academia	1Nanyang Technological University, Singapore, 2 University of California, Los Angeles, 3The University of Edinburgh, 4Northwestern University
Pseudocode	Yes	Algorithm 1 The workflow of TS2C during training
Open Source Code	Yes	Code is available at https://metadriverse.github.io/TS2C.
Open Datasets	Yes	The majority of the experiments are conducted on the lightweight driving simulator Meta Drive (Li et al., 2022a). ... we also conduct experiments in several environments of the Mu Jo Co simulator (Todorov et al., 2012).
Dataset Splits	No	We choose 100 scenes for training and 50 held-out scenes for testing.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) were explicitly stated.
Experiment Setup	Yes	The hyper-parameters used in the experiments are shown in the following tables. In the TS2C algorithm, larger values of the intervention threshold ε1 and ε2 will lead to a more strict intervention criterion and the steps with teacher control will be fewer. ... Table 1: TS2C (Ours) Hyper-parameter Value Discount Factor γ 0.99 ... Learning Rate 0.0001 ...