Recomposing the Reinforcement Learning Building Blocks with Hypernetworks
Authors: Elad Sarafian, Shai Keynan, Sarit Kraus
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL). We conducted our experiments in the Mu Jo Co simulator (Todorov et al., 2012) and tested the algorithms on the benchmark environments available in Open AI Gym (Brockman et al., 2016). The results and the comparison to the baselines are summarized in Fig. 6. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our Hypernetwork Py Torch implementation is found at https://github.com/keynans/Hype RL. |
| Open Datasets | Yes | We conducted our experiments in the Mu Jo Co simulator (Todorov et al., 2012) and tested the algorithms on the benchmark environments available in Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper states 'All experiments were averaged over 5 seeds' and mentions 'training distribution' for Meta-RL tasks but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) or a cross-validation setup required for full reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions 'Py Torch' as the implementation framework and cites 'Ketkar, N. (2017). Introduction to pytorch.' but does not specify the version numbers for PyTorch or any other software libraries or dependencies used, which is necessary for reproducibility. |
| Experiment Setup | Yes | The Hypernetwork training was executed with the baseline loss s.t. we changed only the networks model and adjusted the learning rate to fit the different architecture. The dynamic network fwθ(z)(x) contains only a single hidden layer of 256 which is smaller than the standard MLP architecture used in many RL papers (Fujimoto et al., 2018; Haarnoja et al., 2018) of 2 hidden layers, each with 256 neurons. |