Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Authors: Patryk Chrabąszcz, Ilya Loshchilov, Frank Hutter

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally study the performance characteristics of both ES algorithms, demonstrating that (1) individual runs have high variance in performance and that (2) longer runs (5h instead of 1h) lead to significant further performance improvements. and In our experiments, we evaluate the performance of the Canonical ES on a subset of 8 Atari games available in Open AI Gym.
Researcher Affiliation Academia Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter University of Freiburg, Freiburg, Germany {chrabasp,ilya,fh}@cs.uni-freiburg.de
Pseudocode Yes Algorithm 1: Open AI ES and Algorithm 2: Canonical ES Algorithm
Open Source Code Yes We make our implementation of the Canonical ES algorithm available online at https://github.com/PatrykChrabaszcz/Canonical-ES-Atari.
Open Datasets Yes In our experiments, we evaluate the performance of the Canonical ES on a subset of 8 Atari games available in Open AI Gym [Brockman et al., 2016].
Dataset Splits No The paper describes evaluation rollouts (e.g., '30 rollouts') but does not specify explicit train/validation/test dataset splits in terms of percentages, sample counts, or predefined static data divisions for reproduction.
Hardware Specification No The paper mentions running experiments on '400 CPUs' but does not provide specific hardware details such as CPU model numbers, GPU specifications, or memory.
Software Dependencies No The paper mentions architectural components and techniques like 'ELU' and 'batch normalization' but does not specify software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes Network Architecture: We use the same network structure as the original DQN work [Mnih and others, 2015], only changing the activation function from ReLU to ELU [Clevert et al., 2015] and adding batch normalization layers [Ioffe and Szegedy, 2015]. The network as presented in Figure 3 has approximately 1.7M parameters. We initialize network weights using samples from a normal distribution N(µ = 0, σ = 0.05). Virtual Batch Normalization: Following Salimans et al. [2017], we use virtual batch normalization [Salimans et al., 2016]... Training: For each game and each ES variant we tested, we performed 3 training runs, each on 400 CPUs with a time budget of 10 hours... We limit episodes to have a maximum length of 25k steps... We start each episode with up to 30 initial random no-op actions. We fixed µ = 50 for all games.