Planning With Pixels in (Almost) Real Time

Authors: Wilmer Bandres, Blai Bonet, Hector Geffner

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The benchmark contains 58 games for the Atari 2600, all with screens of 160 x 210 pixels. Experiments were performed on an Amazon EC2 cluster made of m4.16xlarge instances each featuring 64 Intel Xeon E5-2686 CPUs running at 2.30GHz and 256Gb of RAM. IW(1) and three versions of Rollout IW(1) running over the B-PROST features are compared with the shallow RL algorithm using the full Blob-PROST feature set (Liang et al. 2016), the DQN algorithm, the results of the human player reported in (Mnih et al. 2015), and the IW(1) algorithm over the RAM states (Lipovetzky, Ram ırez, and Geffner 2015). The last one only included as a reference.
Researcher Affiliation Academia Wilmer Bandres Universitat Pompeu Fabra Barcelona, Spain twilmer0593@gmail.com Blai Bonet Universidad Sim on Bol ıvar Caracas, Venezuela bonet@usb.ve Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain hector.geffner@upf.edu
Pseudocode Yes Figure 1: Pseudo-code of Rollout IW(1).
Open Source Code No The paper does not provide explicit statements about releasing their source code or links to a code repository for the described methodology.
Open Datasets Yes The Atari 2600 video games supported in the ALE environment (Bellemare et al. 2013) provide a challenging set of benchmarks for reinforcement learning (RL) and planning algorithms. The benchmark contains 58 games for the Atari 2600, all with screens of 160 x 210 pixels.
Dataset Splits No The paper describes an online planning setting where performance is evaluated on game environments, rather than using a predefined dataset with explicit training, validation, and test splits in the traditional sense of supervised learning.
Hardware Specification Yes Experiments were performed on an Amazon EC2 cluster made of m4.16xlarge instances each featuring 64 Intel Xeon E5-2686 CPUs running at 2.30GHz and 256Gb of RAM.
Software Dependencies No The paper mentions the 'ALE environment' but does not specify any software versions for this or any other software dependencies (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes The algorithms are evaluated with time budgets for online decision making of 0.50 and 32 seconds, and a frameskip of 15 that is compatible with human play. In addition, for actions that do not manage to change the value of any B-PROST feature in 15 frames, we apply the action for another 15 frames before pruning the node. To avoid this, negative rewards are multiplied by a large constant α = 50,000 and a high negative reward of 10α is used for deaths.