Context Shift Reduction for Offline Meta-Reinforcement Learning
Authors: Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains. |
| Researcher Affiliation | Collaboration | 1 University of Science and Technology of China, USTC, Hefei, China 2 State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 3 Cambricon Technologies 4 University of Chinese Academy of Sciences, UCAS, Beijing, China 5 Intelligent Software Research Center, Institute of Software, CAS, Beijing, China 6 Shanghai Innovation Center for Processor Technologies, SHIC, Shanghai, China |
| Pseudocode | Yes | We summarized our meta-training and meta-test method in the pseudo-code of Appendix A. |
| Open Source Code | Yes | Code is available at https: //github.com/Morean P/CSRO.git. |
| Open Datasets | Yes | We evaluate our method on the Point-Robot and Mu Jo Co [29] that are often used as the offline meta-RL benchmarks... The source code is taken from the rand_param_envs repository.2https://github.com/dennisl88/rand_param_envs. |
| Dataset Splits | No | For each environment, we sampled 30 training tasks and 10 test tasks from the task distribution. ... Out of these, 30 environments are designated as training environments, while the remaining 10 environments serve as test environments. The paper explicitly mentions training and testing tasks/environments but does not specify a separate validation set or split beyond implicitly using the test set for evaluation. There is no explicit mention of a validation set or how it's used if it exists (e.g., for hyperparameter tuning). |
| Hardware Specification | No | The paper mentions 'Mu Jo Co physics simulator [29]' and 'robots' in various environments, implying computational resources. However, it does not specify any particular CPU, GPU, or other hardware used for running the experiments or training the models. It does not provide specific model numbers, processor types, or memory details. |
| Software Dependencies | No | While specific algorithms/frameworks are mentioned (SAC [10], BRAC [32], CLUB [3], Off Pearl [27], FOCAL [20], CORRO [37], BOReL [4], Meta CURE [38], IDAQ [31]), no specific version numbers for these software packages or underlying programming languages (e.g., Python, PyTorch, TensorFlow) are provided. |
| Experiment Setup | Yes | Table 3: Hyperparameters used in offline datasets collection. ... Table 5: Hyperparameters used in offline meta-training. |