Control-Aware Representations for Model-based Reinforcement Learning
Authors: Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed algorithms by extensive experiments on benchmark tasks and compare them with several LCE baselines. In this section, we experiment with the following continuous control domains: (i) Planar System, (ii) Inverted Pendulum (Swingup), (iii) Cartpole, (iv) Three-link Manipulator (3-Pole), and compare the performance of our CARL algorithms with three LCE baselines: PCC (Levine et al., 2020), SOLAR (Zhang et al., 2019), SLAC (Lee et al., 2020), and two implementations of Dreamer (Hafner et al., 2020a) (described below). |
| Researcher Affiliation | Industry | Brandon Cui Yinlam Chow Mohammad Ghavamzadeh Facebook AI Research Google Research Google Research bcui@fb.com yinlamchow@google.com ghavamza@google.com |
| Pseudocode | Yes | Algorithm 1 Latent Space Learning with Policy Iteration (LSLPI) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | In this section, we experiment with the following continuous control domains: (i) Planar System, (ii) Inverted Pendulum (Swingup), (iii) Cartpole, and (iv) Three-link Manipulator (3-Pole) and compare the performance of our CARL algorithms with three LCE baselines. We report the detailed setup of the experiments in Appendix E, in particular, the description of the domains in Appendix E.1 and the implementation of the algorithms in Appendix E.3. |
| Dataset Splits | No | The paper describes generating data through interaction with continuous control domains and mentions training multiple models and performing control tasks, but it does not specify explicit training, validation, or test dataset splits in percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms like Soft Actor-Critic (SAC) but does not provide specific version numbers for software libraries, programming languages, or other dependencies used in the implementation or experimentation. |
| Experiment Setup | Yes | We report the detailed setup of the experiments in Appendix E, in particular, the description of the domains in Appendix E.1 and the implementation of the algorithms in Appendix E.3. |