Model-Based Offline Planning
Authors: Arthur Argenson, Gabriel Dulac-Arnold
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability to leverage planning to respect environmental constraints. |
| Researcher Affiliation | Industry | Arthur Argenson aarg@google.com Google Research Gabriel Dulac-Arnold dulacarnold@google.com Google Research |
| Pseudocode | Yes | Algorithm 1 High-Level MBOP-Policy; Algorithm 2 MBOP-Trajopt |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology. It only mentions that 'Accompanying videos are available here' and 'All non-standard datasets will be available publicly'. |
| Open Datasets | Yes | We use standard datasets from the RL Unplugged (RLU) (Gulcehre et al., 2020) and D4RL (Fu et al., 2020) papers. |
| Dataset Splits | Yes | On all datasets, training is performed on 90% of data and 10% is used for validation. |
| Hardware Specification | Yes | We calculate the average control frequency of MBOP on the RLU Walker task using a single Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz core and a Nvidia 1080TI ... Execution speeds on the RLU Walker task in represented in Table 9. ... on an Tesla P100 using a single core of a Xeon 2200 MHz equivalent processor. |
| Software Dependencies | No | The paper describes the software components (e.g., neural networks) but does not provide specific version numbers for any libraries, frameworks, or environments used. |
| Experiment Setup | Yes | The full set of parameters for each experiment can be found in the Appendix Sec. 5.2. ... # FC Layers : 2 Size FC Layers : 500 # Ensemble Networks : 3 Learning Rate : 0.001 Batch Size : 512 # Epochs : 40 |