An Evolution Strategy with Progressive Episode Lengths for Playing Games

Authors: Lior Fuks, Noor Awad, Frank Hutter, Marius Lindauer

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated PEL on a subset of Atari games from Open AI Gym, showing that it can substantially improve the optimization speed, stability and final score of canonical ES. Specifically, we show average improvements of 80% (32%) after 2 hours (10 hours) compared to canonical ES.
Researcher Affiliation Academia Lior Fuks , Noor Awad , Frank Hutter and Marius Lindauer University of Freiburg, Germany {fuksl, awad, fh, lindauer}@cs.uni-freiburg.de
Pseudocode Yes Algorithm 1: Canonical Evolution Strategy Algorithm 2: ES-based Progressive Episode Length
Open Source Code No The paper does not contain an explicit statement about the availability of the authors' source code or a link to a code repository for the methodology described.
Open Datasets Yes To evaluate the performance of our proposed algorithm, we used a set of Atari games [Chrabaszcz et al., 2018] from Open AI Gym [Brockman et al., 2016] and used the parallelization technique introduced by Salimans et al. [2017] that reduces the communication needed between workers.
Dataset Splits No The paper does not specify distinct training, validation, and test dataset splits in the conventional sense (e.g., percentage splits of a static dataset). It refers to 'evaluating the top found policy for 30 times' and 'running five independent repetitions' for performance assessment, but not a separate validation split for hyperparameter tuning or early stopping.
Hardware Specification Yes Each run used 400 CPUs on a high-performance cluster equipped with Intel Xeon E5-2630v4 processors and 128GB RAM.
Software Dependencies No The paper mentions software components like Open AI Gym and refers to network architectures (e.g., based on Mnih et al. [2015]), but it does not specify version numbers for any software dependencies, such as programming languages, libraries, or specific frameworks.
Experiment Setup Yes Table 1: Hyperparameters used in all ES variants (same as used by Chrabaszcz et al). Variable Symbol Value Population size λ 800 Parent population size µ 50 Mutation step size σ 0.01 We evaluated two time schedulers: Tc The time limit is set to a constant of 1 hour, Tc(n) = 1 Td The time limit is set to 20 minutes and doubled in each iteration, Td(n) = 20 2n. Both versions use a doubling scheme to increase the maximal episode length.