Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regularizing Trajectory Optimization with Denoising Autoencoders

Authors: Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4]. See the description of the environments in Appendix B. We use the Probabilistic Ensembles with Trajectory Sampling (PETS) model from [5] as the baseline, which achieves the best reported results on all the considered tasks except for Ant. ... The learning progress of the compared algorithms is presented in Fig. 4.
Researcher Affiliation Collaboration Rinu Boney Aalto University & Curious AI EMAIL Norman Di Palo Sapienza University of Rome EMAIL Mathias Berglund Curious AI Alexander Ilin Aalto University & Curious AI Juho Kannala Aalto University Antti Rasmus Curious AI Harri Valpola Curious AI
Pseudocode Yes Algorithm 1 End-to-end model-based reinforcement learning
Open Source Code No The paper mentions: 'Videos of our agents during training can be found at https://sites.google.com/view/ regularizing-mbrl-with-dae/home.' This link is for videos, not for the source code of their methodology. There is no explicit statement or link indicating that the code for their proposed method is publicly available.
Open Datasets Yes We show the effect of the proposed regularization for control in standard Mujoco environments: Cartpole, Reacher, Pusher, Half-cheetah and Ant available in [4].
Dataset Splits No The paper describes collecting data ('Collect data D by random policy') and training models, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for the data collected from the environment.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using 'Mujoco environments' and refers to specific algorithms like 'Adam' and 'cross-entropy method (CEM)', but it does not provide specific ancillary software details with version numbers (e.g., 'Python 3.x, PyTorch 1.x') for reproducibility.
Experiment Setup Yes For all environments, we use a dynamics model with the same architecture: three hidden layers of size 200 with the Swish non-linearity [26]. ... We train the dynamics model for 100 or more epochs (see Appendix C) after every episode. ... Important hyperparameters used in our experiments are reported in the Appendix C. For DAE-regularized trajectory optimization we used either CEM or Adam as optimizers.