Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Goal-Conditioned Generators of Deep Policies
Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jรผrgen Schmidhuber
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. |
| Researcher Affiliation | Collaboration | 1The Swiss AI Lab IDSIA/USI/SUPSI, Lugano, Ticino, Switzerland 2AI Initiative, KAUST, Thuwal, Saudi Arabia 3NNAISENSE, Lugano, Switzerland |
| Pseudocode | Yes | Algorithm 1: Go Ge Po with return commands |
| Open Source Code | Yes | Our code is public. Our implementations are publicly available2. 2https://github.com/IDSIA/Go Ge Po |
| Open Datasets | Yes | We evaluate our method on continuous control tasks from the Mu Jo Co (Todorov, Erez, and Tassa 2012) suite. |
| Dataset Splits | No | The paper describes training procedures and data collection through interaction with environments but does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset. |
| Hardware Specification | Yes | This work was supported by... the Swiss National Supercomputing Centre (CSCS, projects: s1090, s1154). We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine. |
| Software Dependencies | No | The paper mentions using the MuJoCo suite and various RL algorithms (DDPG, SAC, TD3, ARS, UDRL) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In the experiments, all policies are MLPs with two hidden layers, each having 256 neurons. Our method uses the same set of hyperparameters in all environments. For ARS and UDRL, we tune a set of hyperparameters separately for each environment (step size, population size, and noise for ARS; nonlinearity, learning rate and the last few parameter for UDRL). Details can be found in Appendix A. |