reproducibilityindex.ai

Exploring Model-based Planning with Policy Networks

Authors: Tingwu Wang, Jimmy Ba

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that in the Mu Jo Co benchmarking environments, POPLIN is about 3x more sample efﬁcient than the previously stateof-the-art algorithms, such as PETS, TD3 and SAC. Section 5 is titled “EXPERIMENTS” and discusses performance comparisons.
Researcher Affiliation	Academia	1 Department of Computer Science, University of Toronto 2 Vector Institute {tingwuwang,jba}@cs.toronto.edu
Pseudocode	Yes	Algorithm 1 General POPLIN Framework. Algorithm 2 POPLIN-A-Init. Algorithm 3 POPLIN-A-Replan. Algorithm 4 POPLIN-P.
Open Source Code	Yes	Code is released here1. 1https://github.com/WilsonWangTHU/POPLIN.
Open Datasets	Yes	We examine the algorithms with 12 environments, which is a wide collection of environments from Open AI Gym (Brockman et al., 2016) and the environments proposed in PETS (Chua et al., 2018), which are summarized in appendix A.2.
Dataset Splits	No	The paper describes training time-steps and performance evaluation, but does not provide explicit training, validation, or test dataset splits in terms of percentages or sample counts for the environments used.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies	No	The paper mentions using Open AI Gym and MuJoCo environments, but does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow, or other libraries).
Experiment Setup	Yes	In section A.3.1, 'HYPER-PARAMETERS', the paper lists specific hyper-parameter grid search options for PETS, POPLIN-A, and POPLIN-P, including Population Size, Planning Horizon, Initial Distribution Sigma, CEM Iterations, and Elite Size. It also states: 'for all of the experiments on PETS, POPLIN, we use the model type PE (probabilistic ensembles) and propagation method of E (expectation).'