Fast Efficient Hyperparameter Tuning for Policy Gradient Methods
Authors: Supratik Paul, Vitaly Kurin, Shimon Whiteson
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results across multiple domains and algorithms show that using HOOF to learn these hyperparameter schedules leads to faster learning with improved performance. We evaluate HOOF across a range of simulated continuous control tasks using the Mujoco Open AI Gym environments (Brockman et al., 2016). We repeat all experiments across 10 random starts. In all figures solid lines represent the median, and shaded regions the quartiles. Similarly all results in tables represent the median. |
| Researcher Affiliation | Academia | Supratik Paul, Vitaly Kurin, Shimon Whiteson Deptartment of Computer Science University of Oxford {supratik.paul,vitaly.kurin,shimon.whiteson}@cs.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1 HOOF |
| Open Source Code | Yes | Details about all hyperparameters can be found in the appendices, and code is available at https://github. com/supratikp/HOOF. |
| Open Datasets | Yes | To experimentally validate HOOF, we apply it to four simulated continuous control tasks from Mu Jo Co Open AI Gym (Brockman et al., 2016): Half Cheetah, Hopper, Ant, and Walker. |
| Dataset Splits | No | The paper refers to 'training run' and 'samples' but does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper states: 'The experiments were made possible by a generous equipment grant from NVIDIA.' While this indicates the brand of hardware, it does not specify any particular GPU model, CPU type, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions software components like 'A2C', 'Open AI Baselines', 'RMSProp', 'ADAM', and 'SGD' as optimizers, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The paper states, 'Details about all hyperparameters can be found in the appendices,' which indicates that specific hyperparameter values (e.g., learning rate α, GAE parameters γ and λ, KL constraint ϵ) are provided within the paper's full content. It also mentions specific settings like 'ϵ = 0.03' for HOOF with A2C and discusses tuning hyperparameters like α0 and β for meta-gradients. |