Offline Model-based Adaptable Policy Learning
Authors: Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on Mu Jo Co controlling tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms. |
| Researcher Affiliation | Collaboration | 1 National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China 2 AI Labs, Didi Chuxing 3 Polixir.ai |
| Pseudocode | Yes | Algorithm 1 Offline model-based adaptable policy learning |
| Open Source Code | Yes | We release our code at Github 2. 2https://github.com/xionghuichen/MAPLE |
| Open Datasets | Yes | We evaluate MAPLE on multiple offline Mu Jo Co tasks [16]. We test MAPLE in standard offline RL tasks with D4RL datasets [30]. |
| Dataset Splits | No | The paper uses an offline dataset but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard benchmark splits). |
| Hardware Specification | Yes | For example, by using NVIDIA Tesla P40 and Xeon(R) E5-2630 to train the algorithms, the time overhead of MAPLE-200 is 10 times longer than MAPLE. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All the details of MAPLE s training and evaluation are given in Appendix E and Appendix F. The horizon H is set to 10 in these tasks. The policy is trained for 1000 iterations in the policy learning stage. |