Goal-Conditioned Generators of Deep Policies

Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.
Researcher Affiliation Collaboration 1The Swiss AI Lab IDSIA/USI/SUPSI, Lugano, Ticino, Switzerland 2AI Initiative, KAUST, Thuwal, Saudi Arabia 3NNAISENSE, Lugano, Switzerland
Pseudocode Yes Algorithm 1: Go Ge Po with return commands
Open Source Code Yes Our code is public. Our implementations are publicly available2. 2https://github.com/IDSIA/Go Ge Po
Open Datasets Yes We evaluate our method on continuous control tasks from the Mu Jo Co (Todorov, Erez, and Tassa 2012) suite.
Dataset Splits No The paper describes training procedures and data collection through interaction with environments but does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset.
Hardware Specification Yes This work was supported by... the Swiss National Supercomputing Centre (CSCS, projects: s1090, s1154). We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine.
Software Dependencies No The paper mentions using the MuJoCo suite and various RL algorithms (DDPG, SAC, TD3, ARS, UDRL) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes In the experiments, all policies are MLPs with two hidden layers, each having 256 neurons. Our method uses the same set of hyperparameters in all environments. For ARS and UDRL, we tune a set of hyperparameters separately for each environment (step size, population size, and noise for ARS; nonlinearity, learning rate and the last few parameter for UDRL). Details can be found in Appendix A.