Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Authors: Elad Sarafian, Shai Keynan, Sarit Kraus

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL). We conducted our experiments in the Mu Jo Co simulator (Todorov et al., 2012) and tested the algorithms on the benchmark environments available in Open AI Gym (Brockman et al., 2016). The results and the comparison to the baselines are summarized in Fig. 6.
Researcher Affiliation Academia 1Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our Hypernetwork Py Torch implementation is found at https://github.com/keynans/Hype RL.
Open Datasets Yes We conducted our experiments in the Mu Jo Co simulator (Todorov et al., 2012) and tested the algorithms on the benchmark environments available in Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper states 'All experiments were averaged over 5 seeds' and mentions 'training distribution' for Meta-RL tasks but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) or a cross-validation setup required for full reproduction of data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework and cites 'Ketkar, N. (2017). Introduction to pytorch.' but does not specify the version numbers for PyTorch or any other software libraries or dependencies used, which is necessary for reproducibility.
Experiment Setup Yes The Hypernetwork training was executed with the baseline loss s.t. we changed only the networks model and adjusted the learning rate to fit the different architecture. The dynamic network fwθ(z)(x) contains only a single hidden layer of 256 which is smaller than the standard MLP architecture used in many RL papers (Fujimoto et al., 2018; Haarnoja et al., 2018) of 2 hidden layers, each with 256 neurons.