Exploring Model-based Planning with Policy Networks
Authors: Tingwu Wang, Jimmy Ba
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that in the Mu Jo Co benchmarking environments, POPLIN is about 3x more sample efficient than the previously stateof-the-art algorithms, such as PETS, TD3 and SAC. Section 5 is titled “EXPERIMENTS” and discusses performance comparisons. |
| Researcher Affiliation | Academia | 1 Department of Computer Science, University of Toronto 2 Vector Institute {tingwuwang,jba}@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1 General POPLIN Framework. Algorithm 2 POPLIN-A-Init. Algorithm 3 POPLIN-A-Replan. Algorithm 4 POPLIN-P. |
| Open Source Code | Yes | Code is released here1. 1https://github.com/WilsonWangTHU/POPLIN. |
| Open Datasets | Yes | We examine the algorithms with 12 environments, which is a wide collection of environments from Open AI Gym (Brockman et al., 2016) and the environments proposed in PETS (Chua et al., 2018), which are summarized in appendix A.2. |
| Dataset Splits | No | The paper describes training time-steps and performance evaluation, but does not provide explicit training, validation, or test dataset splits in terms of percentages or sample counts for the environments used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications). |
| Software Dependencies | No | The paper mentions using Open AI Gym and MuJoCo environments, but does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow, or other libraries). |
| Experiment Setup | Yes | In section A.3.1, 'HYPER-PARAMETERS', the paper lists specific hyper-parameter grid search options for PETS, POPLIN-A, and POPLIN-P, including Population Size, Planning Horizon, Initial Distribution Sigma, CEM Iterations, and Elite Size. It also states: 'for all of the experiments on PETS, POPLIN, we use the model type PE (probabilistic ensembles) and propagation method of E (expectation).' |