Regularizing Trajectory Optimization with Denoising Autoencoders

Authors: Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4]. See the description of the environments in Appendix B. We use the Probabilistic Ensembles with Trajectory Sampling (PETS) model from [5] as the baseline, which achieves the best reported results on all the considered tasks except for Ant. ... The learning progress of the compared algorithms is presented in Fig. 4.
Researcher Affiliation Collaboration Rinu Boney Aalto University & Curious AI rinu.boney@aalto.fi Norman Di Palo Sapienza University of Rome normandipalo@gmail.com Mathias Berglund Curious AI Alexander Ilin Aalto University & Curious AI Juho Kannala Aalto University Antti Rasmus Curious AI Harri Valpola Curious AI
Pseudocode Yes Algorithm 1 End-to-end model-based reinforcement learning
Open Source Code No The paper mentions: 'Videos of our agents during training can be found at https://sites.google.com/view/ regularizing-mbrl-with-dae/home.' This link is for videos, not for the source code of their methodology. There is no explicit statement or link indicating that the code for their proposed method is publicly available.
Open Datasets Yes We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4].
Dataset Splits No The paper describes collecting data ('Collect data D by random policy') and training models, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for the data collected from the environment.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using 'Mujoco environments' and refers to specific algorithms like 'Adam' and 'cross-entropy method (CEM)', but it does not provide specific ancillary software details with version numbers (e.g., 'Python 3.x, PyTorch 1.x') for reproducibility.
Experiment Setup Yes For all environments, we use a dynamics model with the same architecture: three hidden layers of size 200 with the Swish non-linearity [26]. ... We train the dynamics model for 100 or more epochs (see Appendix C) after every episode. ... Important hyperparameters used in our experiments are reported in the Appendix C. For DAE-regularized trajectory optimization we used either CEM or Adam as optimizers.