Sample Efficient Reinforcement Learning with Gaussian Processes

Authors: Robert Grande, Thomas Walsh, Jonathan How

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results, including those on an F-16 simulator, show DGPQ is both sample efficient and orders of magnitude faster in persample computation than other PAC-MDP continuous-state learners.
Researcher Affiliation Academia Robert C. Grande RGRANDE@MIT.EDU Thomas J. Walsh THOMASJWALSH@GMAIL.COM Jonathan P. How JHOW@MIT.EDU Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139 USA
Pseudocode Yes Algorithm 1 Delayed GPQ (DGPQ)
Open Source Code No The paper does not provide an unambiguous statement or a direct link indicating that the source code for the methodology is openly available.
Open Datasets No The paper uses a '2-dimensional square' environment and an 'F16 simulator' for experiments, which are custom environments/simulators rather than publicly available datasets with specific access information.
Dataset Splits No The paper describes experiments on simulators but does not specify exact percentages, sample counts, or refer to predefined splits for training, validation, or testing datasets.
Hardware Specification No The paper mentions 'simulations were run in MATLAB' but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions 'MATLAB' but does not specify a version number. It also refers to 'sparse online GP approximation (Csat o & Opper, 2002)', which is a method, not a specific versioned software dependency.
Experiment Setup Yes We used an L1 distance metric, LQ = 9, and an RBF kernel with θ = 0.05, ω2 n = 0.1 for the GP. We used the reward r = |h hd|/100ft | h|/100ft/s |δe|, with aircraft height h, desired height hd and elevator angle (degrees) δe. The control input was discretized as δe { 1, 0, 1} and the elevator was used to control the aircraft. The simulation time step size was 0.05s and at each step, the air speed was perturbed with Gaussian noise N(0, 1) and the angle of attack was perturbed with Gaussian noise N(0, 0.012). A RBF kernel with θ = 0.05, ω2 n = 0.1 was used.