reproducibilityindex.ai

An Evolution Strategy with Progressive Episode Lengths for Playing Games

Authors: Lior Fuks, Noor Awad, Frank Hutter, Marius Lindauer

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated PEL on a subset of Atari games from Open AI Gym, showing that it can substantially improve the optimization speed, stability and ﬁnal score of canonical ES. Speciﬁcally, we show average improvements of 80% (32%) after 2 hours (10 hours) compared to canonical ES.
Researcher Affiliation	Academia	Lior Fuks , Noor Awad , Frank Hutter and Marius Lindauer University of Freiburg, Germany {fuksl, awad, fh, lindauer}@cs.uni-freiburg.de
Pseudocode	Yes	Algorithm 1: Canonical Evolution Strategy Algorithm 2: ES-based Progressive Episode Length
Open Source Code	No	The paper does not contain an explicit statement about the availability of the authors' source code or a link to a code repository for the methodology described.
Open Datasets	Yes	To evaluate the performance of our proposed algorithm, we used a set of Atari games [Chrabaszcz et al., 2018] from Open AI Gym [Brockman et al., 2016] and used the parallelization technique introduced by Salimans et al. [2017] that reduces the communication needed between workers.
Dataset Splits	No	The paper does not specify distinct training, validation, and test dataset splits in the conventional sense (e.g., percentage splits of a static dataset). It refers to 'evaluating the top found policy for 30 times' and 'running ﬁve independent repetitions' for performance assessment, but not a separate validation split for hyperparameter tuning or early stopping.
Hardware Specification	Yes	Each run used 400 CPUs on a high-performance cluster equipped with Intel Xeon E5-2630v4 processors and 128GB RAM.
Software Dependencies	No	The paper mentions software components like Open AI Gym and refers to network architectures (e.g., based on Mnih et al. [2015]), but it does not specify version numbers for any software dependencies, such as programming languages, libraries, or specific frameworks.
Experiment Setup	Yes	Table 1: Hyperparameters used in all ES variants (same as used by Chrabaszcz et al). Variable Symbol Value Population size λ 800 Parent population size µ 50 Mutation step size σ 0.01 We evaluated two time schedulers: Tc The time limit is set to a constant of 1 hour, Tc(n) = 1 Td The time limit is set to 20 minutes and doubled in each iteration, Td(n) = 20 2n. Both versions use a doubling scheme to increase the maximal episode length.