Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Recurrent Hypernetworks are Surprisingly Strong in Meta-RL
Authors: Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated. |
| Researcher Affiliation | Academia | Jacob Beck Department of Computer Science University of Oxford, United Kingdom EMAIL Risto Vuorio Department of Computer Science University of Oxford, United Kingdom EMAIL Zheng Xiong Department of Computer Science University of Oxford, United Kingdom EMAIL Shimon Whiteson Department of Computer Science University of Oxford, United Kingdom EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We provide code at https://github.com/jacooba/hyper. |
| Open Datasets | No | The paper refers to standard benchmarks like Mu Jo Co and Mine Craft environments but does not provide concrete access information (link, DOI, or specific citation with authors/year) for the datasets themselves. It only mentions the environments. |
| Dataset Splits | No | The paper mentions evaluating over three seeds and tuning over five learning rates but does not specify train/validation/test splits with percentages, sample counts, or citations to predefined splits. It refers to 'training the multi-task policy' and 'meta-RL policy training time' but not explicit data splits. |
| Hardware Specification | Yes | Experiments were run on four to eight machines simultaneously each with eight GPUs, ranging from NVIDIA Ge Force GTX 1080Ti to NVIDIA RTX A5000. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We tune each baseline over five learning rates for the policy, [3e-3, 1e-3, 3e-4, 1e-4, 3e-5], for three seeds each. We use a learning rate for the task inference modules of 0.001... For the state embedding size (passed to the policy) we chose 256 and for the sizes of the MLP policy we chose 256, followed by 128... For the state embedding size passed to the trajectory encoder, we used 32... And for all RNNs, we use a single GRU layer of size 256. |