SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations
Authors: Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our in-depth experimental results demonstrate that our method substantially improves the agent s ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. We conducted experiments on four Open AI gym s Mu Jo Co environments to answer the above questions. |
| Researcher Affiliation | Academia | Chan Kim1 , Jaekyung Cho1 , Christophe Bobda2 , Seung-Woo Seo1 and Seong-Woo Kim1 1Seoul National University 2University of Florida {chan kim, jackyoung96, sseo, snwoo}@snu.ac.kr, cbobda@ece.ufl.edu |
| Pseudocode | No | A detailed explanation of the overall retraining procedure can be found in the supplementary material. |
| Open Source Code | Yes | Code and supplementary materials are available at https://github.com/SNUChan Kim/Se RO. |
| Open Datasets | Yes | We used Half Cheetah-v2, Hopper-v2, Walker2D-v2, and Ant-v2 from the gym s Mu Jo Co environments [Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes training and retraining phases in simulation environments but does not provide specific dataset split information (e.g., percentages or sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI gym s Mu Jo Co environments' and 'SAC' and 'Python' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We first trained the agents in the training environments for 1 million steps using SAC |