Model-Based Offline Planning with Trajectory Pruning
Authors: Xianyuan Zhan, Xiangyu Zhu, Haoran Xu
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Results We evaluate and compare the performance of MOPP with several state-of-the-art (SOTA) baselines on standard offline RL benchmark D4RL [Fu et al., 2020]. We conduct experiments on the widely-used Mu Jo Co tasks and the more complex Adroit hand manipulation tasks. All results are averaged based on 5 random seeds, with 20 episode runs per seed. |
| Researcher Affiliation | Collaboration | 1Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 2JD i City & JD Intelligent Cities Research, JD Technology, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Complete algorithm of MOPP |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We evaluate and compare the performance of MOPP with several state-of-the-art (SOTA) baselines on standard offline RL benchmark D4RL [Fu et al., 2020]. |
| Dataset Splits | No | The paper mentions evaluating on the D4RL benchmark and averaging results over '5 random seeds, with 20 episode runs per seed,' but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) used for model training and selection. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., PyTorch version, Python version, specific library versions). |
| Experiment Setup | Yes | We conduct ablation experiments on walker2d-med-expert task to understand the impact of key elements in MOPP. We first investigate in Figure 1(a) the impact of sampling aggressiveness (controlled by std scaling parameter σM), as well as its relationship with the max-Q operation and trajectory pruning. ... We further examine the impacts of value function Vb and max-Q operation on different planning horizons in Figure 1(b). ... Finally, Figure 1(c) presents the impact of uncertainty threshold L in trajectory pruning. |