Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization

Authors: Olov Andersson, Fredrik Heintz, Patrick Doherty

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of the approach is demonstrated on both an extended cart pole domain and a challenging quadcopter navigation task using real data.
Researcher Affiliation Academia Olov Andersson, Fredrik Heintz and Patrick Doherty {olov.a.andersson, fredrik.heintz, patrick.doherty}@liu.se Department of Computer and Information Science Link oping University, SE-58183 Link oping, Sweden
Pseudocode Yes Algorithm 1 RL-RCO: Real-time Constrained Opt.
Open Source Code No The paper does not provide any specific repository links or explicit statements about the release of source code for the described methodology.
Open Datasets No Initially 8 minutes of training data was collected from manual flight using on-board sensors as well as an external camera-based positioning system, resulting in about 12 000 samples at the controller target time step of 0.04s.An initial data collection episode was used to bootstrap the agent, using sinusoid actions capped to 30 interactions. The paper uses data collected by the authors and doesn't provide access to it or explicitly state its public availability.
Dataset Splits No The paper mentions 'training data' but does not provide explicit details on dataset splits, such as percentages for training, validation, or testing sets, or the methodology for creating these splits.
Hardware Specification No The paper mentions 'Using one core of a desktop CPU all agents could run faster than real-time' but does not specify the exact model or detailed specifications of the CPU or any other hardware components used for the experiments.
Software Dependencies No The paper references various algorithms and approaches like 'Gaussian processes' and 'FORCES (Domahidi et al. 2012) stage-wise LTV MPC solver', but does not list specific software dependencies with their version numbers required for reproduction.
Experiment Setup Yes Dynamics models for the state transition distribution p(xt|xt 1, at 1) were iteratively learned between episodes for both linear and angular accelerations using sparse Gaussian processes with 40 inducing inputs and ARD kernels.The maximum number of interactions of the RL episodes was then set to 200 and the utility function to the negative square distance to the target position.We take a soft real-time approach by running a fixed number linearizations per interaction cycle, which in our experiments was set to 4.