Learning to Reach Goals via Iterated Supervised Learning
Authors: Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Manon Devin, Benjamin Eysenbach, Sergey Levine
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We formally show that this iterated supervised learning procedure optimizes a bound on the RL objective, derive performance bounds of the learned policy, and empirically demonstrate improved goal-reaching performance and robustness over current RL algorithms in several benchmark tasks. |
| Researcher Affiliation | Academia | Dibya Ghosh UC Berkeley Abhishek Gupta UC Berkeley Ashwin Reddy UC Berkeley Justin Fu UC Berkeley Coline Devin UC Berkeley Benjamin Eysenbach Carnegie Mellon University Sergey Levine UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Goal-Conditioned Supervised Learning (GCSL) |
| Open Source Code | Yes | We have additionally open-sourced our implementation at https://github.com/dibyaghosh/gcsl. |
| Open Datasets | Yes | Lunar Lander (Brockman et al., 2016) This environment requires a rocket to land in a specified region. |
| Dataset Splits | No | The paper describes data collection and training procedures but does not explicitly detail training/validation/test dataset splits with percentages or sample counts for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not provide version numbers for this or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The GCSL loss is optimized using the Adam optimizer with learning rate α = 5 10 4, with a batch size of 256, taking one gradient step for every step in the environment. |