Control-Aware Representations for Model-based Reinforcement Learning

Authors: Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed algorithms by extensive experiments on benchmark tasks and compare them with several LCE baselines. In this section, we experiment with the following continuous control domains: (i) Planar System, (ii) Inverted Pendulum (Swingup), (iii) Cartpole, (iv) Three-link Manipulator (3-Pole), and compare the performance of our CARL algorithms with three LCE baselines: PCC (Levine et al., 2020), SOLAR (Zhang et al., 2019), SLAC (Lee et al., 2020), and two implementations of Dreamer (Hafner et al., 2020a) (described below).
Researcher Affiliation Industry Brandon Cui Yinlam Chow Mohammad Ghavamzadeh Facebook AI Research Google Research Google Research bcui@fb.com yinlamchow@google.com ghavamza@google.com
Pseudocode Yes Algorithm 1 Latent Space Learning with Policy Iteration (LSLPI)
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets Yes In this section, we experiment with the following continuous control domains: (i) Planar System, (ii) Inverted Pendulum (Swingup), (iii) Cartpole, and (iv) Three-link Manipulator (3-Pole) and compare the performance of our CARL algorithms with three LCE baselines. We report the detailed setup of the experiments in Appendix E, in particular, the description of the domains in Appendix E.1 and the implementation of the algorithms in Appendix E.3.
Dataset Splits No The paper describes generating data through interaction with continuous control domains and mentions training multiple models and performing control tasks, but it does not specify explicit training, validation, or test dataset splits in percentages or sample counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms like Soft Actor-Critic (SAC) but does not provide specific version numbers for software libraries, programming languages, or other dependencies used in the implementation or experimentation.
Experiment Setup Yes We report the detailed setup of the experiments in Appendix E, in particular, the description of the domains in Appendix E.1 and the implementation of the algorithms in Appendix E.3.