Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL

Authors: Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS, Experiments in the introduced benchmark demonstrate that WGCSL can consistently outperform GCSL and existing state-of-the-art offline methods in the fully offline goal-conditioned setting.
Researcher Affiliation Collaboration Rui Yang1, Yiming Lu1, Wenzhe Li1, Hao Sun2, Meng Fang3, Yali Du4, Xiu Li1, Lei Han5 , Chongjie Zhang1 1Tsinghua University, 2University of Cambridge, 3Eindhoven University of Technology, 4King s College London, 5Tencent Robotics X
Pseudocode Yes Algorithm 1: Weighted Goal-Conditioned Supervised Learning
Open Source Code Yes Code and offline dataset are available at https://github.com/Yang Rui2015/AWGCSL
Open Datasets Yes For the evaluation of offline goal-conditioned RL algorithms, we provide a public benchmark and offline datasets including a range of point and simulated robot domains. Code and offline dataset are available at https://github.com/Yang Rui2015/AWGCSL
Dataset Splits No No explicit train/validation/test dataset splits were described with specific percentages, sample counts, or citations to predefined splits. The paper describes training on the full offline dataset and then evaluating the learned policy.
Hardware Specification Yes The training time would vary across different platforms, and we use one single GPU (Tesla P100 PCIe 16GB) and 5 cpu cores (Intel Xeon E5-2680 v4 @ 2.40GHz) to run 5 random seeds parallelly.
Software Dependencies No The paper mentions software components like 'Adam optimizer' and 'relu activation' but does not provide specific version numbers for these or other core libraries like PyTorch or Python itself.
Experiment Setup Yes The policy networks of GCSL and WGCSL are 3-layer MLP with 256 units each layer and relu activation... The batch size is set as 128 for the first 6 tasks, and 512 for 4 harder tasks... We use Adam optimizer with a learning rate of 5 × 10^−4.