SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations

Authors: Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our in-depth experimental results demonstrate that our method substantially improves the agent s ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. We conducted experiments on four Open AI gym s Mu Jo Co environments to answer the above questions.
Researcher Affiliation Academia Chan Kim1 , Jaekyung Cho1 , Christophe Bobda2 , Seung-Woo Seo1 and Seong-Woo Kim1 1Seoul National University 2University of Florida {chan kim, jackyoung96, sseo, snwoo}@snu.ac.kr, cbobda@ece.ufl.edu
Pseudocode No A detailed explanation of the overall retraining procedure can be found in the supplementary material.
Open Source Code Yes Code and supplementary materials are available at https://github.com/SNUChan Kim/Se RO.
Open Datasets Yes We used Half Cheetah-v2, Hopper-v2, Walker2D-v2, and Ant-v2 from the gym s Mu Jo Co environments [Brockman et al., 2016].
Dataset Splits No The paper describes training and retraining phases in simulation environments but does not provide specific dataset split information (e.g., percentages or sample counts) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Open AI gym s Mu Jo Co environments' and 'SAC' and 'Python' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We first trained the agents in the training environments for 1 million steps using SAC