reproducibilityindex.ai

Model-Based Offline Planning with Trajectory Pruning

Authors: Xianyuan Zhan, Xiangyu Zhu, Haoran Xu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experimental Results We evaluate and compare the performance of MOPP with several state-of-the-art (SOTA) baselines on standard ofﬂine RL benchmark D4RL [Fu et al., 2020]. We conduct experiments on the widely-used Mu Jo Co tasks and the more complex Adroit hand manipulation tasks. All results are averaged based on 5 random seeds, with 20 episode runs per seed.
Researcher Affiliation	Collaboration	1Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 2JD i City & JD Intelligent Cities Research, JD Technology, Beijing, China
Pseudocode	Yes	Algorithm 1 Complete algorithm of MOPP
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We evaluate and compare the performance of MOPP with several state-of-the-art (SOTA) baselines on standard ofﬂine RL benchmark D4RL [Fu et al., 2020].
Dataset Splits	No	The paper mentions evaluating on the D4RL benchmark and averaging results over '5 random seeds, with 20 episode runs per seed,' but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) used for model training and selection.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., PyTorch version, Python version, specific library versions).
Experiment Setup	Yes	We conduct ablation experiments on walker2d-med-expert task to understand the impact of key elements in MOPP. We ﬁrst investigate in Figure 1(a) the impact of sampling aggressiveness (controlled by std scaling parameter σM), as well as its relationship with the max-Q operation and trajectory pruning. ... We further examine the impacts of value function Vb and max-Q operation on different planning horizons in Figure 1(b). ... Finally, Figure 1(c) presents the impact of uncertainty threshold L in trajectory pruning.