Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL
Authors: Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS, Experiments in the introduced benchmark demonstrate that WGCSL can consistently outperform GCSL and existing state-of-the-art offline methods in the fully offline goal-conditioned setting. |
| Researcher Affiliation | Collaboration | Rui Yang1, Yiming Lu1, Wenzhe Li1, Hao Sun2, Meng Fang3, Yali Du4, Xiu Li1, Lei Han5 , Chongjie Zhang1 1Tsinghua University, 2University of Cambridge, 3Eindhoven University of Technology, 4King s College London, 5Tencent Robotics X |
| Pseudocode | Yes | Algorithm 1: Weighted Goal-Conditioned Supervised Learning |
| Open Source Code | Yes | Code and offline dataset are available at https://github.com/Yang Rui2015/AWGCSL |
| Open Datasets | Yes | For the evaluation of offline goal-conditioned RL algorithms, we provide a public benchmark and offline datasets including a range of point and simulated robot domains. Code and offline dataset are available at https://github.com/Yang Rui2015/AWGCSL |
| Dataset Splits | No | No explicit train/validation/test dataset splits were described with specific percentages, sample counts, or citations to predefined splits. The paper describes training on the full offline dataset and then evaluating the learned policy. |
| Hardware Specification | Yes | The training time would vary across different platforms, and we use one single GPU (Tesla P100 PCIe 16GB) and 5 cpu cores (Intel Xeon E5-2680 v4 @ 2.40GHz) to run 5 random seeds parallelly. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'relu activation' but does not provide specific version numbers for these or other core libraries like PyTorch or Python itself. |
| Experiment Setup | Yes | The policy networks of GCSL and WGCSL are 3-layer MLP with 256 units each layer and relu activation... The batch size is set as 128 for the first 6 tasks, and 512 for 4 harder tasks... We use Adam optimizer with a learning rate of 5 × 10^−4. |