Model-Free Trajectory Optimization for Reinforcement Learning

Authors: Riad Akrour, Gerhard Neumann, Hany Abdulsamad, Abbas Abdolmaleki

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental section demonstrates that on tasks with highly non-linear dynamics MOTO outperforms similar methods that rely on a linearization of these dynamics. Additionally, it is shown on a simulated Robot Table Tennis Task that MOTO is able to scale to high dimensional tasks while keeping the sample complexity relatively low; amenable to a direct application to a physical system.
Researcher Affiliation Academia Riad Akrour1 AKROUR@IAS.TU-DARMSTADT.DE Abbas Abdolmaleki3 ABBAS.A@UA.PT Hany Abdulsamad2 ABDULSAMAD@IAS.TU-DARMSTADT.DE Gerhard Neumann1 NEUMANN@IAS.TU-DARMSTADT.DE 1: CLAS, 2: IAS, TU Darmstadt, Darmstadt, Germany 3: IEETA, University of Aveiro, Aveiro, Portugal
Pseudocode Yes Algorithm 1 Model-Free Trajectory Optimization (MOTO)
Open Source Code No The paper does not provide any explicit statements or links to open-source code for the described methodology.
Open Datasets No The paper uses simulated environments like "multi-link swing-up tasks" and "simulated Robot Table Tennis Task". It does not provide access information (links, DOIs, formal citations) for these simulated environments or any external datasets used in a publicly available or open manner.
Dataset Splits No The paper describes using "M rollouts" for sampling and discusses sample reuse, but it does not specify explicit train/validation/test dataset splits (e.g., percentages, sample counts, or predefined external splits) for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes Input: Initial policy π0, number of trajectories per iteration M, step-size ϵ and entropy reduction rate β0... The number of rollouts per iteration is reduced to M = 20.