reproducibilityindex.ai

Goal-Conditioned Generators of Deep Policies

Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.
Researcher Affiliation	Collaboration	1The Swiss AI Lab IDSIA/USI/SUPSI, Lugano, Ticino, Switzerland 2AI Initiative, KAUST, Thuwal, Saudi Arabia 3NNAISENSE, Lugano, Switzerland
Pseudocode	Yes	Algorithm 1: Go Ge Po with return commands
Open Source Code	Yes	Our code is public. Our implementations are publicly available2. 2https://github.com/IDSIA/Go Ge Po
Open Datasets	Yes	We evaluate our method on continuous control tasks from the Mu Jo Co (Todorov, Erez, and Tassa 2012) suite.
Dataset Splits	No	The paper describes training procedures and data collection through interaction with environments but does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset.
Hardware Specification	Yes	This work was supported by... the Swiss National Supercomputing Centre (CSCS, projects: s1090, s1154). We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine.
Software Dependencies	No	The paper mentions using the MuJoCo suite and various RL algorithms (DDPG, SAC, TD3, ARS, UDRL) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In the experiments, all policies are MLPs with two hidden layers, each having 256 neurons. Our method uses the same set of hyperparameters in all environments. For ARS and UDRL, we tune a set of hyperparameters separately for each environment (step size, population size, and noise for ARS; nonlinearity, learning rate and the last few parameter for UDRL). Details can be found in Appendix A.