Fourier Policy Gradients
Authors: Matthew Fellows, Kamil Ciosek, Shimon Whiteson
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While the main contribution of this paper is theoretical, we also provide an empirical evaluation using a periodic critic on a simple turntable problem that demonstrates the practical benefit of using a trigonometric critic.We evaluated a periodic critic of this form on a toy turntable domain where the goal is to rotate a flat record to the desired position by rotating it (see Appendix D for details). We compared it to the DPG baseline from Open AI (Dhariwal et al., 2017), which uses a neural network based critic capable of addressing complex control tasks. As expected, the learning curves in Figure 1 show that using a periodic critic (F-EPG) leads to faster learning, because it encodes more information about the action space than a generic neural network. |
| Researcher Affiliation | Academia | Matthew Fellows * 1 Kamil Ciosek * 1 Shimon Whiteson 1 1Department of Computer Science, University of Oxford, United Kingdom. Correspondence to: Matthew Fellows <matthew.fellows@cs.ox.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 Expected Policy Gradient |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available. |
| Open Datasets | No | We evaluated a periodic critic of this form on a toy turntable domain where the goal is to rotate a flat record to the desired position by rotating it (see Appendix D for details).(Explanation: The paper mentions a "toy turntable domain" but does not provide any access information (link, DOI, formal citation) for this environment or associated data as a publicly available dataset.) |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud computing specifications). |
| Software Dependencies | No | The paper mentions using 'OpenAI Baselines' but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | No | The paper describes the 'turntable domain' and compares with a DPG baseline, but it does not explicitly provide hyperparameter values (e.g., learning rate, batch size, optimizer settings) or detailed training configurations for the experiments. |