Learning to Reach Goals via Iterated Supervised Learning

Authors: Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Manon Devin, Benjamin Eysenbach, Sergey Levine

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We formally show that this iterated supervised learning procedure optimizes a bound on the RL objective, derive performance bounds of the learned policy, and empirically demonstrate improved goal-reaching performance and robustness over current RL algorithms in several benchmark tasks.
Researcher Affiliation Academia Dibya Ghosh UC Berkeley Abhishek Gupta UC Berkeley Ashwin Reddy UC Berkeley Justin Fu UC Berkeley Coline Devin UC Berkeley Benjamin Eysenbach Carnegie Mellon University Sergey Levine UC Berkeley
Pseudocode Yes Algorithm 1 Goal-Conditioned Supervised Learning (GCSL)
Open Source Code Yes We have additionally open-sourced our implementation at https://github.com/dibyaghosh/gcsl.
Open Datasets Yes Lunar Lander (Brockman et al., 2016) This environment requires a rocket to land in a specified region.
Dataset Splits No The paper describes data collection and training procedures but does not explicitly detail training/validation/test dataset splits with percentages or sample counts for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not provide version numbers for this or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes The GCSL loss is optimized using the Adam optimizer with learning rate α = 5 10 4, with a batch size of 256, taking one gradient step for every step in the environment.