reproducibilityindex.ai

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Authors: Elad Sarafian, Shai Keynan, Sarit Kraus

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL). We conducted our experiments in the Mu Jo Co simulator (Todorov et al., 2012) and tested the algorithms on the benchmark environments available in Open AI Gym (Brockman et al., 2016). The results and the comparison to the baselines are summarized in Fig. 6.
Researcher Affiliation	Academia	1Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our Hypernetwork Py Torch implementation is found at https://github.com/keynans/Hype RL.
Open Datasets	Yes	We conducted our experiments in the Mu Jo Co simulator (Todorov et al., 2012) and tested the algorithms on the benchmark environments available in Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper states 'All experiments were averaged over 5 seeds' and mentions 'training distribution' for Meta-RL tasks but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) or a cross-validation setup required for full reproduction of data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions 'Py Torch' as the implementation framework and cites 'Ketkar, N. (2017). Introduction to pytorch.' but does not specify the version numbers for PyTorch or any other software libraries or dependencies used, which is necessary for reproducibility.
Experiment Setup	Yes	The Hypernetwork training was executed with the baseline loss s.t. we changed only the networks model and adjusted the learning rate to ﬁt the different architecture. The dynamic network fwθ(z)(x) contains only a single hidden layer of 256 which is smaller than the standard MLP architecture used in many RL papers (Fujimoto et al., 2018; Haarnoja et al., 2018) of 2 hidden layers, each with 256 neurons.