Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
Authors: Jianhao Wang, Jin Zhang, Haozhe Jiang, Junyu Zhang, Liwei Wang, Chongjie Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that IDAQ achieves state-of-the-art performance on the Meta-World ML1 benchmark compared to baselines with/without offline adaptation.Empirical results show that IDAQ significantly outperforms baselines with fast online adaptation, and achieves better or comparable performance than offline adaptation baselines with expert context. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Huazhong University of Science and Technology 3Institute of Artificial Intelligence, Peking University. |
| Pseudocode | Yes | The overall algorithm of IDAQ is illustrated in Algorithm 1. |
| Open Source Code | Yes | An open-source implementation of our algorithm is available online1. 1https://github.com/Nagisa Zj/IDAQ_Public |
| Open Datasets | Yes | We extensively evaluate the performance of IDAQ in didactic problems proposed by prior work (Rakelly et al., 2019; Zhang et al., 2021) and Meta-World ML1 benchmark with 50 tasks (Yu et al., 2020b). |
| Dataset Splits | No | For each task set, we use 40 tasks as meta-training tasks, and remain the other 10 tasks as meta-testing tasks. The paper does not explicitly mention a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instances used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'optimizer adam' and network structures but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Table 4 shows hyper-parameter settings for the task sets used in our experiments. and Table 5 shows IDAQ s hyper-parameter settings. |