Regularizing Trajectory Optimization with Denoising Autoencoders
Authors: Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4]. See the description of the environments in Appendix B. We use the Probabilistic Ensembles with Trajectory Sampling (PETS) model from [5] as the baseline, which achieves the best reported results on all the considered tasks except for Ant. ... The learning progress of the compared algorithms is presented in Fig. 4. |
| Researcher Affiliation | Collaboration | Rinu Boney Aalto University & Curious AI rinu.boney@aalto.fi Norman Di Palo Sapienza University of Rome normandipalo@gmail.com Mathias Berglund Curious AI Alexander Ilin Aalto University & Curious AI Juho Kannala Aalto University Antti Rasmus Curious AI Harri Valpola Curious AI |
| Pseudocode | Yes | Algorithm 1 End-to-end model-based reinforcement learning |
| Open Source Code | No | The paper mentions: 'Videos of our agents during training can be found at https://sites.google.com/view/ regularizing-mbrl-with-dae/home.' This link is for videos, not for the source code of their methodology. There is no explicit statement or link indicating that the code for their proposed method is publicly available. |
| Open Datasets | Yes | We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4]. |
| Dataset Splits | No | The paper describes collecting data ('Collect data D by random policy') and training models, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for the data collected from the environment. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Mujoco environments' and refers to specific algorithms like 'Adam' and 'cross-entropy method (CEM)', but it does not provide specific ancillary software details with version numbers (e.g., 'Python 3.x, PyTorch 1.x') for reproducibility. |
| Experiment Setup | Yes | For all environments, we use a dynamics model with the same architecture: three hidden layers of size 200 with the Swish non-linearity [26]. ... We train the dynamics model for 100 or more epochs (see Appendix C) after every episode. ... Important hyperparameters used in our experiments are reported in the Appendix C. For DAE-regularized trajectory optimization we used either CEM or Adam as optimizers. |