reproducibilityindex.ai

Context Shift Reduction for Offline Meta-Reinforcement Learning

Authors: Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.
Researcher Affiliation	Collaboration	1 University of Science and Technology of China, USTC, Hefei, China 2 State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 3 Cambricon Technologies 4 University of Chinese Academy of Sciences, UCAS, Beijing, China 5 Intelligent Software Research Center, Institute of Software, CAS, Beijing, China 6 Shanghai Innovation Center for Processor Technologies, SHIC, Shanghai, China
Pseudocode	Yes	We summarized our meta-training and meta-test method in the pseudo-code of Appendix A.
Open Source Code	Yes	Code is available at https: //github.com/Morean P/CSRO.git.
Open Datasets	Yes	We evaluate our method on the Point-Robot and Mu Jo Co [29] that are often used as the offline meta-RL benchmarks... The source code is taken from the rand_param_envs repository.2https://github.com/dennisl88/rand_param_envs.
Dataset Splits	No	For each environment, we sampled 30 training tasks and 10 test tasks from the task distribution. ... Out of these, 30 environments are designated as training environments, while the remaining 10 environments serve as test environments. The paper explicitly mentions training and testing tasks/environments but does not specify a separate validation set or split beyond implicitly using the test set for evaluation. There is no explicit mention of a validation set or how it's used if it exists (e.g., for hyperparameter tuning).
Hardware Specification	No	The paper mentions 'Mu Jo Co physics simulator [29]' and 'robots' in various environments, implying computational resources. However, it does not specify any particular CPU, GPU, or other hardware used for running the experiments or training the models. It does not provide specific model numbers, processor types, or memory details.
Software Dependencies	No	While specific algorithms/frameworks are mentioned (SAC [10], BRAC [32], CLUB [3], Off Pearl [27], FOCAL [20], CORRO [37], BOReL [4], Meta CURE [38], IDAQ [31]), no specific version numbers for these software packages or underlying programming languages (e.g., Python, PyTorch, TensorFlow) are provided.
Experiment Setup	Yes	Table 3: Hyperparameters used in offline datasets collection. ... Table 5: Hyperparameters used in offline meta-training.