Hypernetworks for Zero-Shot Transfer in Reinforcement Learning

Authors: Sahand Rezaei-Shoshtari, Charlotte Morissette, Francois R. Hogan, Gregory Dudek, David Meger

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from Deep Mind Control Suite.
Researcher Affiliation Collaboration 1Mc Gill University 2Mila Qu ebec AI Institute 3Samsung AI Center Montreal srezaei@cim.mcgill.ca
Pseudocode Yes Algorithm 1 shows the pseudo-code of our learning framework.
Open Source Code Yes Our learning code, generated datasets, and custom continuous control environments, which are built upon Deep Mind Control Suite, are publicly available at: https://sites.google.com/view/hyperzero-rl
Open Datasets Yes Our learning code, generated datasets, and custom continuous control environments, which are built upon Deep Mind Control Suite, are publicly available at: https://sites.google.com/view/hyperzero-rl
Dataset Splits Yes To reliably evaluate the zero-shot transfer abilities of Hyper Zero to novel reward/dynamics settings against the baselines, and to rule out the possibility of selective choosing of train/test tasks, we randomly divide task settings into train (%85) and test (%15) sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments.
Software Dependencies No The paper mentions using TD3 as the RL algorithm and Deep Mind Control Suite for environments but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes RL Training and Dataset Collection. We use TD3 (Fujimoto, Hoof, and Meger 2018) as the RL algorithm that is to be approximated. Each MDP Mi M , generated by sampling ψi p(ψ) and µi p(µ), is used to independently train a standard TD3 agent on proprioceptive states for 1 million steps. Consequently, the final solution is used to generate 10 rollouts to be added to the dataset D. ... Train/Test Split of the Tasks. ... we randomly divide task settings into train (%85) and test (%15) sets. We consequently report the mean and standard deviation of the average return obtained on 5 seeds.