Model-Free Trajectory Optimization for Reinforcement Learning
Authors: Riad Akrour, Gerhard Neumann, Hany Abdulsamad, Abbas Abdolmaleki
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental section demonstrates that on tasks with highly non-linear dynamics MOTO outperforms similar methods that rely on a linearization of these dynamics. Additionally, it is shown on a simulated Robot Table Tennis Task that MOTO is able to scale to high dimensional tasks while keeping the sample complexity relatively low; amenable to a direct application to a physical system. |
| Researcher Affiliation | Academia | Riad Akrour1 AKROUR@IAS.TU-DARMSTADT.DE Abbas Abdolmaleki3 ABBAS.A@UA.PT Hany Abdulsamad2 ABDULSAMAD@IAS.TU-DARMSTADT.DE Gerhard Neumann1 NEUMANN@IAS.TU-DARMSTADT.DE 1: CLAS, 2: IAS, TU Darmstadt, Darmstadt, Germany 3: IEETA, University of Aveiro, Aveiro, Portugal |
| Pseudocode | Yes | Algorithm 1 Model-Free Trajectory Optimization (MOTO) |
| Open Source Code | No | The paper does not provide any explicit statements or links to open-source code for the described methodology. |
| Open Datasets | No | The paper uses simulated environments like "multi-link swing-up tasks" and "simulated Robot Table Tennis Task". It does not provide access information (links, DOIs, formal citations) for these simulated environments or any external datasets used in a publicly available or open manner. |
| Dataset Splits | No | The paper describes using "M rollouts" for sampling and discusses sample reuse, but it does not specify explicit train/validation/test dataset splits (e.g., percentages, sample counts, or predefined external splits) for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | Input: Initial policy π0, number of trajectories per iteration M, step-size ϵ and entropy reduction rate β0... The number of rollouts per iteration is reduced to M = 20. |