Model-Based Offline Planning with Trajectory Pruning

Authors: Xianyuan Zhan, Xiangyu Zhu, Haoran Xu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental Results We evaluate and compare the performance of MOPP with several state-of-the-art (SOTA) baselines on standard offline RL benchmark D4RL [Fu et al., 2020]. We conduct experiments on the widely-used Mu Jo Co tasks and the more complex Adroit hand manipulation tasks. All results are averaged based on 5 random seeds, with 20 episode runs per seed.
Researcher Affiliation Collaboration 1Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 2JD i City & JD Intelligent Cities Research, JD Technology, Beijing, China
Pseudocode Yes Algorithm 1 Complete algorithm of MOPP
Open Source Code No The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes We evaluate and compare the performance of MOPP with several state-of-the-art (SOTA) baselines on standard offline RL benchmark D4RL [Fu et al., 2020].
Dataset Splits No The paper mentions evaluating on the D4RL benchmark and averaging results over '5 random seeds, with 20 episode runs per seed,' but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) used for model training and selection.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., PyTorch version, Python version, specific library versions).
Experiment Setup Yes We conduct ablation experiments on walker2d-med-expert task to understand the impact of key elements in MOPP. We first investigate in Figure 1(a) the impact of sampling aggressiveness (controlled by std scaling parameter σM), as well as its relationship with the max-Q operation and trajectory pruning. ... We further examine the impacts of value function Vb and max-Q operation on different planning horizons in Figure 1(b). ... Finally, Figure 1(c) presents the impact of uncertainty threshold L in trajectory pruning.