Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hypernetworks for Zero-Shot Transfer in Reinforcement Learning
Authors: Sahand Rezaei-Shoshtari, Charlotte Morissette, Francois R. Hogan, Gregory Dudek, David Meger
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from Deep Mind Control Suite. |
| Researcher Affiliation | Collaboration | 1Mc Gill University 2Mila Qu ebec AI Institute 3Samsung AI Center Montreal EMAIL |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code of our learning framework. |
| Open Source Code | Yes | Our learning code, generated datasets, and custom continuous control environments, which are built upon Deep Mind Control Suite, are publicly available at: https://sites.google.com/view/hyperzero-rl |
| Open Datasets | Yes | Our learning code, generated datasets, and custom continuous control environments, which are built upon Deep Mind Control Suite, are publicly available at: https://sites.google.com/view/hyperzero-rl |
| Dataset Splits | Yes | To reliably evaluate the zero-shot transfer abilities of Hyper Zero to novel reward/dynamics settings against the baselines, and to rule out the possibility of selective choosing of train/test tasks, we randomly divide task settings into train (%85) and test (%15) sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using TD3 as the RL algorithm and Deep Mind Control Suite for environments but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | RL Training and Dataset Collection. We use TD3 (Fujimoto, Hoof, and Meger 2018) as the RL algorithm that is to be approximated. Each MDP Mi M , generated by sampling ฯi p(ฯ) and ยตi p(ยต), is used to independently train a standard TD3 agent on proprioceptive states for 1 million steps. Consequently, the final solution is used to generate 10 rollouts to be added to the dataset D. ... Train/Test Split of the Tasks. ... we randomly divide task settings into train (%85) and test (%15) sets. We consequently report the mean and standard deviation of the average return obtained on 5 seeds. |