An Evolution Strategy with Progressive Episode Lengths for Playing Games
Authors: Lior Fuks, Noor Awad, Frank Hutter, Marius Lindauer
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated PEL on a subset of Atari games from Open AI Gym, showing that it can substantially improve the optimization speed, stability and final score of canonical ES. Specifically, we show average improvements of 80% (32%) after 2 hours (10 hours) compared to canonical ES. |
| Researcher Affiliation | Academia | Lior Fuks , Noor Awad , Frank Hutter and Marius Lindauer University of Freiburg, Germany {fuksl, awad, fh, lindauer}@cs.uni-freiburg.de |
| Pseudocode | Yes | Algorithm 1: Canonical Evolution Strategy Algorithm 2: ES-based Progressive Episode Length |
| Open Source Code | No | The paper does not contain an explicit statement about the availability of the authors' source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | To evaluate the performance of our proposed algorithm, we used a set of Atari games [Chrabaszcz et al., 2018] from Open AI Gym [Brockman et al., 2016] and used the parallelization technique introduced by Salimans et al. [2017] that reduces the communication needed between workers. |
| Dataset Splits | No | The paper does not specify distinct training, validation, and test dataset splits in the conventional sense (e.g., percentage splits of a static dataset). It refers to 'evaluating the top found policy for 30 times' and 'running five independent repetitions' for performance assessment, but not a separate validation split for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | Each run used 400 CPUs on a high-performance cluster equipped with Intel Xeon E5-2630v4 processors and 128GB RAM. |
| Software Dependencies | No | The paper mentions software components like Open AI Gym and refers to network architectures (e.g., based on Mnih et al. [2015]), but it does not specify version numbers for any software dependencies, such as programming languages, libraries, or specific frameworks. |
| Experiment Setup | Yes | Table 1: Hyperparameters used in all ES variants (same as used by Chrabaszcz et al). Variable Symbol Value Population size λ 800 Parent population size µ 50 Mutation step size σ 0.01 We evaluated two time schedulers: Tc The time limit is set to a constant of 1 hour, Tc(n) = 1 Td The time limit is set to 20 minutes and doubled in each iteration, Td(n) = 20 2n. Both versions use a doubling scheme to increase the maximal episode length. |