Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Regularizing Trajectory Optimization with Denoising Autoencoders
Authors: Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4]. See the description of the environments in Appendix B. We use the Probabilistic Ensembles with Trajectory Sampling (PETS) model from [5] as the baseline, which achieves the best reported results on all the considered tasks except for Ant. ... The learning progress of the compared algorithms is presented in Fig. 4. |
| Researcher Affiliation | Collaboration | Rinu Boney Aalto University & Curious AI EMAIL Norman Di Palo Sapienza University of Rome EMAIL Mathias Berglund Curious AI Alexander Ilin Aalto University & Curious AI Juho Kannala Aalto University Antti Rasmus Curious AI Harri Valpola Curious AI |
| Pseudocode | Yes | Algorithm 1 End-to-end model-based reinforcement learning |
| Open Source Code | No | The paper mentions: 'Videos of our agents during training can be found at https://sites.google.com/view/ regularizing-mbrl-with-dae/home.' This link is for videos, not for the source code of their methodology. There is no explicit statement or link indicating that the code for their proposed method is publicly available. |
| Open Datasets | Yes | We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4]. |
| Dataset Splits | No | The paper describes collecting data ('Collect data D by random policy') and training models, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for the data collected from the environment. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Mujoco environments' and refers to specific algorithms like 'Adam' and 'cross-entropy method (CEM)', but it does not provide specific ancillary software details with version numbers (e.g., 'Python 3.x, PyTorch 1.x') for reproducibility. |
| Experiment Setup | Yes | For all environments, we use a dynamics model with the same architecture: three hidden layers of size 200 with the Swish non-linearity [26]. ... We train the dynamics model for 100 or more epochs (see Appendix C) after every episode. ... Important hyperparameters used in our experiments are reported in the Appendix C. For DAE-regularized trajectory optimization we used either CEM or Adam as optimizers. |