Offline Model-based Adaptable Policy Learning

Authors: Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on Mu Jo Co controlling tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
Researcher Affiliation Collaboration 1 National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China 2 AI Labs, Didi Chuxing 3 Polixir.ai
Pseudocode Yes Algorithm 1 Offline model-based adaptable policy learning
Open Source Code Yes We release our code at Github 2. 2https://github.com/xionghuichen/MAPLE
Open Datasets Yes We evaluate MAPLE on multiple offline Mu Jo Co tasks [16]. We test MAPLE in standard offline RL tasks with D4RL datasets [30].
Dataset Splits No The paper uses an offline dataset but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard benchmark splits).
Hardware Specification Yes For example, by using NVIDIA Tesla P40 and Xeon(R) E5-2630 to train the algorithms, the time overhead of MAPLE-200 is 10 times longer than MAPLE.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All the details of MAPLE s training and evaluation are given in Appendix E and Appendix F. The horizon H is set to 10 in these tasks. The policy is trained for 1000 iterations in the policy learning stage.