Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Authors: Patryk Chrabąszcz, Ilya Loshchilov, Frank Hutter
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally study the performance characteristics of both ES algorithms, demonstrating that (1) individual runs have high variance in performance and that (2) longer runs (5h instead of 1h) lead to significant further performance improvements. and In our experiments, we evaluate the performance of the Canonical ES on a subset of 8 Atari games available in Open AI Gym. |
| Researcher Affiliation | Academia | Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter University of Freiburg, Freiburg, Germany {chrabasp,ilya,fh}@cs.uni-freiburg.de |
| Pseudocode | Yes | Algorithm 1: Open AI ES and Algorithm 2: Canonical ES Algorithm |
| Open Source Code | Yes | We make our implementation of the Canonical ES algorithm available online at https://github.com/PatrykChrabaszcz/Canonical-ES-Atari. |
| Open Datasets | Yes | In our experiments, we evaluate the performance of the Canonical ES on a subset of 8 Atari games available in Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes evaluation rollouts (e.g., '30 rollouts') but does not specify explicit train/validation/test dataset splits in terms of percentages, sample counts, or predefined static data divisions for reproduction. |
| Hardware Specification | No | The paper mentions running experiments on '400 CPUs' but does not provide specific hardware details such as CPU model numbers, GPU specifications, or memory. |
| Software Dependencies | No | The paper mentions architectural components and techniques like 'ELU' and 'batch normalization' but does not specify software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'). |
| Experiment Setup | Yes | Network Architecture: We use the same network structure as the original DQN work [Mnih and others, 2015], only changing the activation function from ReLU to ELU [Clevert et al., 2015] and adding batch normalization layers [Ioffe and Szegedy, 2015]. The network as presented in Figure 3 has approximately 1.7M parameters. We initialize network weights using samples from a normal distribution N(µ = 0, σ = 0.05). Virtual Batch Normalization: Following Salimans et al. [2017], we use virtual batch normalization [Salimans et al., 2016]... Training: For each game and each ES variant we tested, we performed 3 training runs, each on 400 CPUs with a time budget of 10 hours... We limit episodes to have a maximum length of 25k steps... We start each episode with up to 30 initial random no-op actions. We fixed µ = 50 for all games. |