reproducibilityindex.ai

Sample Efficient Reinforcement Learning with Gaussian Processes

Authors: Robert Grande, Thomas Walsh, Jonathan How

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results, including those on an F-16 simulator, show DGPQ is both sample efﬁcient and orders of magnitude faster in persample computation than other PAC-MDP continuous-state learners.
Researcher Affiliation	Academia	Robert C. Grande RGRANDE@MIT.EDU Thomas J. Walsh THOMASJWALSH@GMAIL.COM Jonathan P. How JHOW@MIT.EDU Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139 USA
Pseudocode	Yes	Algorithm 1 Delayed GPQ (DGPQ)
Open Source Code	No	The paper does not provide an unambiguous statement or a direct link indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper uses a '2-dimensional square' environment and an 'F16 simulator' for experiments, which are custom environments/simulators rather than publicly available datasets with specific access information.
Dataset Splits	No	The paper describes experiments on simulators but does not specify exact percentages, sample counts, or refer to predefined splits for training, validation, or testing datasets.
Hardware Specification	No	The paper mentions 'simulations were run in MATLAB' but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions 'MATLAB' but does not specify a version number. It also refers to 'sparse online GP approximation (Csat o & Opper, 2002)', which is a method, not a specific versioned software dependency.
Experiment Setup	Yes	We used an L1 distance metric, LQ = 9, and an RBF kernel with θ = 0.05, ω2 n = 0.1 for the GP. We used the reward r = \|h hd\|/100ft \| h\|/100ft/s \|δe\|, with aircraft height h, desired height hd and elevator angle (degrees) δe. The control input was discretized as δe { 1, 0, 1} and the elevator was used to control the aircraft. The simulation time step size was 0.05s and at each step, the air speed was perturbed with Gaussian noise N(0, 1) and the angle of attack was perturbed with Gaussian noise N(0, 0.012). A RBF kernel with θ = 0.05, ω2 n = 0.1 was used.