Sample Efficient Reinforcement Learning with Gaussian Processes
Authors: Robert Grande, Thomas Walsh, Jonathan How
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results, including those on an F-16 simulator, show DGPQ is both sample efficient and orders of magnitude faster in persample computation than other PAC-MDP continuous-state learners. |
| Researcher Affiliation | Academia | Robert C. Grande RGRANDE@MIT.EDU Thomas J. Walsh THOMASJWALSH@GMAIL.COM Jonathan P. How JHOW@MIT.EDU Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139 USA |
| Pseudocode | Yes | Algorithm 1 Delayed GPQ (DGPQ) |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link indicating that the source code for the methodology is openly available. |
| Open Datasets | No | The paper uses a '2-dimensional square' environment and an 'F16 simulator' for experiments, which are custom environments/simulators rather than publicly available datasets with specific access information. |
| Dataset Splits | No | The paper describes experiments on simulators but does not specify exact percentages, sample counts, or refer to predefined splits for training, validation, or testing datasets. |
| Hardware Specification | No | The paper mentions 'simulations were run in MATLAB' but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions 'MATLAB' but does not specify a version number. It also refers to 'sparse online GP approximation (Csat o & Opper, 2002)', which is a method, not a specific versioned software dependency. |
| Experiment Setup | Yes | We used an L1 distance metric, LQ = 9, and an RBF kernel with θ = 0.05, ω2 n = 0.1 for the GP. We used the reward r = |h hd|/100ft | h|/100ft/s |δe|, with aircraft height h, desired height hd and elevator angle (degrees) δe. The control input was discretized as δe { 1, 0, 1} and the elevator was used to control the aircraft. The simulation time step size was 0.05s and at each step, the air speed was perturbed with Gaussian noise N(0, 1) and the angle of attack was perturbed with Gaussian noise N(0, 0.012). A RBF kernel with θ = 0.05, ω2 n = 0.1 was used. |