reproducibilityindex.ai

Offline Model-based Adaptable Policy Learning

Authors: Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on Mu Jo Co controlling tasks with ofﬂine datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
Researcher Affiliation	Collaboration	1 National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China 2 AI Labs, Didi Chuxing 3 Polixir.ai
Pseudocode	Yes	Algorithm 1 Ofﬂine model-based adaptable policy learning
Open Source Code	Yes	We release our code at Github 2. 2https://github.com/xionghuichen/MAPLE
Open Datasets	Yes	We evaluate MAPLE on multiple ofﬂine Mu Jo Co tasks [16]. We test MAPLE in standard ofﬂine RL tasks with D4RL datasets [30].
Dataset Splits	No	The paper uses an offline dataset but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard benchmark splits).
Hardware Specification	Yes	For example, by using NVIDIA Tesla P40 and Xeon(R) E5-2630 to train the algorithms, the time overhead of MAPLE-200 is 10 times longer than MAPLE.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	All the details of MAPLE s training and evaluation are given in Appendix E and Appendix F. The horizon H is set to 10 in these tasks. The policy is trained for 1000 iterations in the policy learning stage.