Blind Search for Atari-Like Online Planning Revisited
Authors: Alexander Shleyfman, Alexander Tuisov, Carmel Domshlak
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce and evaluate prioritized IW(i), a simple extension of IW(i) that approximates breadth-first search with duplicate detection and state reopening, and show that it very favorably competes with IW(i) on the Atari games. We then revisit the basic objective underlying deterministic online planning. We argue that the effectiveness of online planning for the Atari games and related problems can be further improved by considering this problem as a multiarmed bandit style competition between the various actions available at the state planned for, and not purely as a classical planning style action sequence optimization problem. Following this lead, we introduce a simple modification of prioritized IW(i) that fits the modified objective, and empirically demonstrate the prospects of this direction. |
| Researcher Affiliation | Academia | Alexander Shleyfman and Alexander Tuisov and Carmel Domshlak Faculty of Industrial Engineering and Management Technion, Israel |
| Pseudocode | No | The paper describes algorithms in prose but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper states 'We used the implementation of IW(1) by Lipovetzky et al. [2015], and have implemented p-IW(1) on top of it,' but does not provide an explicit statement about releasing their own code or a link to a repository. |
| Open Datasets | Yes | We tested p-IW(1) and IW(1) on 53 of the 55 different games considered by Bellemare et al. [2013] |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, and test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'the implementation of IW(1) by Lipovetzky et al. [2015]' and implementing 'p-IW(1) on top of it,' but does not provide specific software names with version numbers for replication. |
| Experiment Setup | Yes | each action selection decision was given a lookahead budget of 150000 simulated frames (or, equivalently, 30000 search nodes), the lookahead depth was limited to 1500 frames, and the accumulated rewards were discounted as R(s0) = R(s) + γd(s)+1r(s, a) where s is the unique parent of s0, a is the respective action, and the discount factor2 was set to γ = 0.995. To reduce the variance, each game was played 30 times, with the reported results being averaged across these runs. |