Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Authors: Jianhao Wang, Jin Zhang, Haozhe Jiang, Junyu Zhang, Liwei Wang, Chongjie Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that IDAQ achieves state-of-the-art performance on the Meta-World ML1 benchmark compared to baselines with/without offline adaptation.Empirical results show that IDAQ significantly outperforms baselines with fast online adaptation, and achieves better or comparable performance than offline adaptation baselines with expert context.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Huazhong University of Science and Technology 3Institute of Artificial Intelligence, Peking University.
Pseudocode Yes The overall algorithm of IDAQ is illustrated in Algorithm 1.
Open Source Code Yes An open-source implementation of our algorithm is available online1. 1https://github.com/Nagisa Zj/IDAQ_Public
Open Datasets Yes We extensively evaluate the performance of IDAQ in didactic problems proposed by prior work (Rakelly et al., 2019; Zhang et al., 2021) and Meta-World ML1 benchmark with 50 tasks (Yu et al., 2020b).
Dataset Splits No For each task set, we use 40 tasks as meta-training tasks, and remain the other 10 tasks as meta-testing tasks. The paper does not explicitly mention a separate validation dataset split.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instances used for running experiments.
Software Dependencies No The paper mentions software components like 'optimizer adam' and network structures but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Table 4 shows hyper-parameter settings for the task sets used in our experiments. and Table 5 shows IDAQ s hyper-parameter settings.