reproducibilityindex.ai

Curriculum Offline Imitating Learning

Authors: Minghuan Liu, Hanye Zhao, Zhengyu Yang, Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art ofﬂine RL methods.
Researcher Affiliation	Collaboration	Minghuan Liu1 Hanye Zhao1 Zhengyu Yang1 Jian Shen1 Weinan Zhang1 Li Zhao2 Tie-Yan Liu2 1 Shanghai Jiao Tong University, 2 Microsoft Research {minghuanliu, fineartz, zyyang, rockyshen, wnzhang}@sjtu.edu.cn, {lizo,tyliu}@microsoft.com
Pseudocode	Yes	The step-by-step algorithm is shown in Algo. 1.
Open Source Code	Yes	Codes are available at https://github.com/apexrl/COIL.
Open Datasets	Yes	To further show the power of COIL, we conduct comparison experiments on a common-used D4RL benchmark [5] in Tab. 2.
Dataset Splits	No	The paper refers to training iterations and 'online evaluation' but does not explicitly state training, validation, and test dataset splits with percentages or sample counts for reproducibility.
Hardware Specification	No	The paper does not explicitly describe the hardware used for experiments, such as specific CPU or GPU models, or cloud computing resources with specifications.
Software Dependencies	No	The paper mentions using 'open-source implementation' for baselines and 'our implementation of BC' but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	It is worth noting that COIL has only two critical hyperparameters, namely, the number of selected trajectories N and the moving window of the return ﬁlter α, both of which can be determined by the property of the dataset. Speciﬁcally, N is related to the average discrepancy between the sampling policies in the dataset; α is inﬂuenced by the changes of the return of the trajectories contained in the dataset. In the ablation study Section 6.3 and Appendix E.2, we demonstrate how we select different hyperparameters for different datasets.