Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

Authors: Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated.
Researcher Affiliation Academia Jacob Beck Department of Computer Science University of Oxford, United Kingdom jacob_beck@alumni.brown.edu Risto Vuorio Department of Computer Science University of Oxford, United Kingdom risto.vuorio@keble.ox.ac.uk Zheng Xiong Department of Computer Science University of Oxford, United Kingdom zheng.xiong@cs.ox.ac.uk Shimon Whiteson Department of Computer Science University of Oxford, United Kingdom shimon.whiteson@cs.ox.ac.uk
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes We provide code at https://github.com/jacooba/hyper.
Open Datasets No The paper refers to standard benchmarks like Mu Jo Co and Mine Craft environments but does not provide concrete access information (link, DOI, or specific citation with authors/year) for the datasets themselves. It only mentions the environments.
Dataset Splits No The paper mentions evaluating over three seeds and tuning over five learning rates but does not specify train/validation/test splits with percentages, sample counts, or citations to predefined splits. It refers to 'training the multi-task policy' and 'meta-RL policy training time' but not explicit data splits.
Hardware Specification Yes Experiments were run on four to eight machines simultaneously each with eight GPUs, ranging from NVIDIA Ge Force GTX 1080Ti to NVIDIA RTX A5000.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We tune each baseline over five learning rates for the policy, [3e-3, 1e-3, 3e-4, 1e-4, 3e-5], for three seeds each. We use a learning rate for the task inference modules of 0.001... For the state embedding size (passed to the policy) we chose 256 and for the sizes of the MLP policy we chose 256, followed by 128... For the state embedding size passed to the trajectory encoder, we used 32... And for all RNNs, we use a single GRU layer of size 256.