Fourier Policy Gradients

Authors: Matthew Fellows, Kamil Ciosek, Shimon Whiteson

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While the main contribution of this paper is theoretical, we also provide an empirical evaluation using a periodic critic on a simple turntable problem that demonstrates the practical benefit of using a trigonometric critic.We evaluated a periodic critic of this form on a toy turntable domain where the goal is to rotate a flat record to the desired position by rotating it (see Appendix D for details). We compared it to the DPG baseline from Open AI (Dhariwal et al., 2017), which uses a neural network based critic capable of addressing complex control tasks. As expected, the learning curves in Figure 1 show that using a periodic critic (F-EPG) leads to faster learning, because it encodes more information about the action space than a generic neural network.
Researcher Affiliation Academia Matthew Fellows * 1 Kamil Ciosek * 1 Shimon Whiteson 1 1Department of Computer Science, University of Oxford, United Kingdom. Correspondence to: Matthew Fellows <matthew.fellows@cs.ox.ac.uk>.
Pseudocode Yes Algorithm 1 Expected Policy Gradient
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available.
Open Datasets No We evaluated a periodic critic of this form on a toy turntable domain where the goal is to rotate a flat record to the desired position by rotating it (see Appendix D for details).(Explanation: The paper mentions a "toy turntable domain" but does not provide any access information (link, DOI, formal citation) for this environment or associated data as a publicly available dataset.)
Dataset Splits No The paper does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts) for reproducibility.
Hardware Specification No The paper does not provide specific details on the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud computing specifications).
Software Dependencies No The paper mentions using 'OpenAI Baselines' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup No The paper describes the 'turntable domain' and compares with a DPG baseline, but it does not explicitly provide hyperparameter values (e.g., learning rate, batch size, optimizer settings) or detailed training configurations for the experiments.