Goal-Conditioned Generators of Deep Policies
Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. |
| Researcher Affiliation | Collaboration | 1The Swiss AI Lab IDSIA/USI/SUPSI, Lugano, Ticino, Switzerland 2AI Initiative, KAUST, Thuwal, Saudi Arabia 3NNAISENSE, Lugano, Switzerland |
| Pseudocode | Yes | Algorithm 1: Go Ge Po with return commands |
| Open Source Code | Yes | Our code is public. Our implementations are publicly available2. 2https://github.com/IDSIA/Go Ge Po |
| Open Datasets | Yes | We evaluate our method on continuous control tasks from the Mu Jo Co (Todorov, Erez, and Tassa 2012) suite. |
| Dataset Splits | No | The paper describes training procedures and data collection through interaction with environments but does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset. |
| Hardware Specification | Yes | This work was supported by... the Swiss National Supercomputing Centre (CSCS, projects: s1090, s1154). We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine. |
| Software Dependencies | No | The paper mentions using the MuJoCo suite and various RL algorithms (DDPG, SAC, TD3, ARS, UDRL) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In the experiments, all policies are MLPs with two hidden layers, each having 256 neurons. Our method uses the same set of hyperparameters in all environments. For ARS and UDRL, we tune a set of hyperparameters separately for each environment (step size, population size, and noise for ARS; nonlinearity, learning rate and the last few parameter for UDRL). Details can be found in Appendix A. |