Recurrent Hypernetworks are Surprisingly Strong in Meta-RL
Authors: Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated. |
| Researcher Affiliation | Academia | Jacob Beck Department of Computer Science University of Oxford, United Kingdom jacob_beck@alumni.brown.edu Risto Vuorio Department of Computer Science University of Oxford, United Kingdom risto.vuorio@keble.ox.ac.uk Zheng Xiong Department of Computer Science University of Oxford, United Kingdom zheng.xiong@cs.ox.ac.uk Shimon Whiteson Department of Computer Science University of Oxford, United Kingdom shimon.whiteson@cs.ox.ac.uk |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We provide code at https://github.com/jacooba/hyper. |
| Open Datasets | No | The paper refers to standard benchmarks like Mu Jo Co and Mine Craft environments but does not provide concrete access information (link, DOI, or specific citation with authors/year) for the datasets themselves. It only mentions the environments. |
| Dataset Splits | No | The paper mentions evaluating over three seeds and tuning over five learning rates but does not specify train/validation/test splits with percentages, sample counts, or citations to predefined splits. It refers to 'training the multi-task policy' and 'meta-RL policy training time' but not explicit data splits. |
| Hardware Specification | Yes | Experiments were run on four to eight machines simultaneously each with eight GPUs, ranging from NVIDIA Ge Force GTX 1080Ti to NVIDIA RTX A5000. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We tune each baseline over five learning rates for the policy, [3e-3, 1e-3, 3e-4, 1e-4, 3e-5], for three seeds each. We use a learning rate for the task inference modules of 0.001... For the state embedding size (passed to the policy) we chose 256 and for the sizes of the MLP policy we chose 256, followed by 128... For the state embedding size passed to the trajectory encoder, we used 32... And for all RNNs, we use a single GRU layer of size 256. |