reproducibilityindex.ai

Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL

Authors: Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS, Experiments in the introduced benchmark demonstrate that WGCSL can consistently outperform GCSL and existing state-of-the-art ofﬂine methods in the fully ofﬂine goal-conditioned setting.
Researcher Affiliation	Collaboration	Rui Yang1, Yiming Lu1, Wenzhe Li1, Hao Sun2, Meng Fang3, Yali Du4, Xiu Li1, Lei Han5 , Chongjie Zhang1 1Tsinghua University, 2University of Cambridge, 3Eindhoven University of Technology, 4King s College London, 5Tencent Robotics X
Pseudocode	Yes	Algorithm 1: Weighted Goal-Conditioned Supervised Learning
Open Source Code	Yes	Code and ofﬂine dataset are available at https://github.com/Yang Rui2015/AWGCSL
Open Datasets	Yes	For the evaluation of ofﬂine goal-conditioned RL algorithms, we provide a public benchmark and ofﬂine datasets including a range of point and simulated robot domains. Code and ofﬂine dataset are available at https://github.com/Yang Rui2015/AWGCSL
Dataset Splits	No	No explicit train/validation/test dataset splits were described with specific percentages, sample counts, or citations to predefined splits. The paper describes training on the full offline dataset and then evaluating the learned policy.
Hardware Specification	Yes	The training time would vary across different platforms, and we use one single GPU (Tesla P100 PCIe 16GB) and 5 cpu cores (Intel Xeon E5-2680 v4 @ 2.40GHz) to run 5 random seeds parallelly.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'relu activation' but does not provide specific version numbers for these or other core libraries like PyTorch or Python itself.
Experiment Setup	Yes	The policy networks of GCSL and WGCSL are 3-layer MLP with 256 units each layer and relu activation... The batch size is set as 128 for the ﬁrst 6 tasks, and 512 for 4 harder tasks... We use Adam optimizer with a learning rate of 5 × 10^−4.