Blind Search for Atari-Like Online Planning Revisited

Authors: Alexander Shleyfman, Alexander Tuisov, Carmel Domshlak

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce and evaluate prioritized IW(i), a simple extension of IW(i) that approximates breadth-first search with duplicate detection and state reopening, and show that it very favorably competes with IW(i) on the Atari games. We then revisit the basic objective underlying deterministic online planning. We argue that the effectiveness of online planning for the Atari games and related problems can be further improved by considering this problem as a multiarmed bandit style competition between the various actions available at the state planned for, and not purely as a classical planning style action sequence optimization problem. Following this lead, we introduce a simple modification of prioritized IW(i) that fits the modified objective, and empirically demonstrate the prospects of this direction.
Researcher Affiliation Academia Alexander Shleyfman and Alexander Tuisov and Carmel Domshlak Faculty of Industrial Engineering and Management Technion, Israel
Pseudocode No The paper describes algorithms in prose but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper states 'We used the implementation of IW(1) by Lipovetzky et al. [2015], and have implemented p-IW(1) on top of it,' but does not provide an explicit statement about releasing their own code or a link to a repository.
Open Datasets Yes We tested p-IW(1) and IW(1) on 53 of the 55 different games considered by Bellemare et al. [2013]
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, and test dataset splits.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'the implementation of IW(1) by Lipovetzky et al. [2015]' and implementing 'p-IW(1) on top of it,' but does not provide specific software names with version numbers for replication.
Experiment Setup Yes each action selection decision was given a lookahead budget of 150000 simulated frames (or, equivalently, 30000 search nodes), the lookahead depth was limited to 1500 frames, and the accumulated rewards were discounted as R(s0) = R(s) + γd(s)+1r(s, a) where s is the unique parent of s0, a is the respective action, and the discount factor2 was set to γ = 0.995. To reduce the variance, each game was played 30 times, with the reported results being averaged across these runs.