reproducibilityindex.ai

Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization

Authors: Olov Andersson, Fredrik Heintz, Patrick Doherty

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The efﬁcacy of the approach is demonstrated on both an extended cart pole domain and a challenging quadcopter navigation task using real data.
Researcher Affiliation	Academia	Olov Andersson, Fredrik Heintz and Patrick Doherty {olov.a.andersson, fredrik.heintz, patrick.doherty}@liu.se Department of Computer and Information Science Link oping University, SE-58183 Link oping, Sweden
Pseudocode	Yes	Algorithm 1 RL-RCO: Real-time Constrained Opt.
Open Source Code	No	The paper does not provide any specific repository links or explicit statements about the release of source code for the described methodology.
Open Datasets	No	Initially 8 minutes of training data was collected from manual ﬂight using on-board sensors as well as an external camera-based positioning system, resulting in about 12 000 samples at the controller target time step of 0.04s.An initial data collection episode was used to bootstrap the agent, using sinusoid actions capped to 30 interactions. The paper uses data collected by the authors and doesn't provide access to it or explicitly state its public availability.
Dataset Splits	No	The paper mentions 'training data' but does not provide explicit details on dataset splits, such as percentages for training, validation, or testing sets, or the methodology for creating these splits.
Hardware Specification	No	The paper mentions 'Using one core of a desktop CPU all agents could run faster than real-time' but does not specify the exact model or detailed specifications of the CPU or any other hardware components used for the experiments.
Software Dependencies	No	The paper references various algorithms and approaches like 'Gaussian processes' and 'FORCES (Domahidi et al. 2012) stage-wise LTV MPC solver', but does not list specific software dependencies with their version numbers required for reproduction.
Experiment Setup	Yes	Dynamics models for the state transition distribution p(xt\|xt 1, at 1) were iteratively learned between episodes for both linear and angular accelerations using sparse Gaussian processes with 40 inducing inputs and ARD kernels.The maximum number of interactions of the RL episodes was then set to 200 and the utility function to the negative square distance to the target position.We take a soft real-time approach by running a ﬁxed number linearizations per interaction cycle, which in our experiments was set to 4.