Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

Authors: Supratik Paul, Vitaly Kurin, Shimon Whiteson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results across multiple domains and algorithms show that using HOOF to learn these hyperparameter schedules leads to faster learning with improved performance. We evaluate HOOF across a range of simulated continuous control tasks using the Mujoco Open AI Gym environments (Brockman et al., 2016). We repeat all experiments across 10 random starts. In all figures solid lines represent the median, and shaded regions the quartiles. Similarly all results in tables represent the median.
Researcher Affiliation Academia Supratik Paul, Vitaly Kurin, Shimon Whiteson Deptartment of Computer Science University of Oxford {supratik.paul,vitaly.kurin,shimon.whiteson}@cs.ox.ac.uk
Pseudocode Yes Algorithm 1 HOOF
Open Source Code Yes Details about all hyperparameters can be found in the appendices, and code is available at https://github. com/supratikp/HOOF.
Open Datasets Yes To experimentally validate HOOF, we apply it to four simulated continuous control tasks from Mu Jo Co Open AI Gym (Brockman et al., 2016): Half Cheetah, Hopper, Ant, and Walker.
Dataset Splits No The paper refers to 'training run' and 'samples' but does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper states: 'The experiments were made possible by a generous equipment grant from NVIDIA.' While this indicates the brand of hardware, it does not specify any particular GPU model, CPU type, or other detailed hardware specifications.
Software Dependencies No The paper mentions software components like 'A2C', 'Open AI Baselines', 'RMSProp', 'ADAM', and 'SGD' as optimizers, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes The paper states, 'Details about all hyperparameters can be found in the appendices,' which indicates that specific hyperparameter values (e.g., learning rate α, GAE parameters γ and λ, KL constraint ϵ) are provided within the paper's full content. It also mentions specific settings like 'ϵ = 0.03' for HOOF with A2C and discusses tuning hyperparameters like α0 and β for meta-gradients.