Discovering Evolution Strategies via Meta-Black-Box Optimization
Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, Sebastian Flennerhag
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that meta-evolving this system on a small set of representative low-dimensional analytic optimization problems is sufficient to discover new evolution strategies capable of generalizing to unseen optimization problems, population sizes and optimization horizons. Furthermore, the same learned evolution strategy can outperform established neuroevolution baselines on supervised and continuous control tasks. As additional contributions, we ablate the individual neural network components of our method; |
| Researcher Affiliation | Collaboration | Robert Tjarko Lange Technical University Berlin Tom Schaul & Yutian Chen & Tom Zahavy Deep Mind Valentin Dallibard Deep Mind Chris Lu University of Oxford Satinder Singh & Sebastian Flennerhag Deep Mind |
| Pseudocode | Yes | C METABBO ALGORITHM FOR META-EVOLVING EVOLUTION STRATEGIES Algorithm 2 Meta BBO Training of Learned Evolution Strategies and F PSEUDO-CODE IMPLEMENTATIONS FOR DES & LES F.1 JAX-BASED PSEUDO-CODE FOR DISCOVERED EVOLUTION STRATEGY Listing 1: Pseudo-Code Discovered Evolution Strategy. F.2 JAX-BASED PSEUDO-CODE FOR LEARNED EVOLUTION STRATEGY Listing 2: Pseudo-Code Learned Evolution Strategy. |
| Open Source Code | Yes | This project has been made possible by the usage of freely available Open Source software. This includes the following: Num Py: Harris et al. (2020), Matplotlib: Hunter (2007), Seaborn: Waskom (2021), JAX: Bradbury et al. (2018), Evosax: Lange (2022a), Gymnax: Lange (2022b), Evojax: Tang et al. (2022), Brax: Freeman et al. (2021). (And from bibliography, Lange (2022a) states evosax: JAX-based evolution strategies, 2022a. URL http://github.com/Robert TLange/evosax.). |
| Open Datasets | Yes | Throughout, we meta-train on functions from the BBOB (Hansen et al., 2010) benchmark, which comprise a set of challenging functions for optimization (Table 1). and continuous control Brax environments (Freeman et al., 2021). and MNIST variants. |
| Dataset Splits | No | No specific train/validation/test dataset splits with percentages or counts were explicitly stated for the overall experimental setup. The paper describes meta-training on a set of functions and then evaluating on unseen functions and tasks, which implies a training and testing distinction, but not a formal validation split of a dataset. |
| Hardware Specification | Yes | The simulations were run on individual NVIDIA V100 GPUs. |
| Software Dependencies | No | This project has been made possible by the usage of freely available Open Source software. This includes the following: Num Py: Harris et al. (2020), Matplotlib: Hunter (2007), Seaborn: Waskom (2021), JAX: Bradbury et al. (2018), Evosax: Lange (2022a), Gymnax: Lange (2022b), Evojax: Tang et al. (2022), Brax: Freeman et al. (2021). (Specific version numbers are not provided for the listed software dependencies.) |
| Experiment Setup | Yes | D HYPERPARAMETER SETTINGS D.1 METABBO HYPERPARAMETERS Hyperparameter Value Meta-Generations 1500 Meta-Population 256 Meta-Tasks 128 Inner-Population 16 Inner-Generations 50 m0 range [-5, 5] Timestamp Range [0, 2000] Dk Attention keys 8 MLP Hidden dim. 8 and D.2 NEUROEVOLUTION EVALUATION HYPERPARAMETERS Brax Task Evaluation Hyperparameters Hyperparameter Value Generations 2000 Population 256 MC Evaluations 16 MLP Layers 4 Hidden Units 32 Activation Tanh |