Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation
Authors: Keyu Wu, Min Wu, Zhenghua Chen, Yuecong Xu, Xiaoli Li8683-8690
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations have been performed on a diversity of tasks. Experimental results demonstrate that SIM consistently outperforms the state-of-the-art methods and exhibits superior generalization capability and sample efficiency. |
| Researcher Affiliation | Collaboration | Institute for Infocomm Research , A*STAR, Singapore {wu keyu, wumin}@i2r.a-star.edu.sg, {chen0832, xuyu0014}@e.ntu.edu.sg, xlli@i2r.a-star.edu.sg |
| Pseudocode | Yes | Algorithm 1: SIM |
| Open Source Code | Yes | Our code and more experimental results are available at https://github.com/Kerry Wu16/SIM. |
| Open Datasets | Yes | Deep Mind Control Suite (DMControl) (Tassa et al. 2018) is a widely used benchmark dataset for vision-based RL algorithm comparison. |
| Dataset Splits | No | The paper does not explicitly provide specific training, validation, and testing dataset splits by percentages or counts. It refers to 'training environment' and 'unseen environments' for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Adam' as an optimizer but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Table 1: Hyperparameters used for the DMControl experiments. Observation rendering 100x100, Observation downsampling 84x84, Stacked frames 3, Action repeat 2 (finger) 8 (cartpole) 4 (otherwise), Discount factor γ 0.99, Replay buffer size 500,000, Initial steps 1000, Learning rate (actor, critic, SSL) 1e-3, Learning rate (α) 1e-4, Initial temperature 0.1, Trade-off constant λ 3.9e-3, Update frequency (actor, critic target, SSL) 2. |