reproducibilityindex.ai

Blind Search for Atari-Like Online Planning Revisited

Authors: Alexander Shleyfman, Alexander Tuisov, Carmel Domshlak

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce and evaluate prioritized IW(i), a simple extension of IW(i) that approximates breadth-ﬁrst search with duplicate detection and state reopening, and show that it very favorably competes with IW(i) on the Atari games. We then revisit the basic objective underlying deterministic online planning. We argue that the effectiveness of online planning for the Atari games and related problems can be further improved by considering this problem as a multiarmed bandit style competition between the various actions available at the state planned for, and not purely as a classical planning style action sequence optimization problem. Following this lead, we introduce a simple modiﬁcation of prioritized IW(i) that ﬁts the modiﬁed objective, and empirically demonstrate the prospects of this direction.
Researcher Affiliation	Academia	Alexander Shleyfman and Alexander Tuisov and Carmel Domshlak Faculty of Industrial Engineering and Management Technion, Israel
Pseudocode	No	The paper describes algorithms in prose but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper states 'We used the implementation of IW(1) by Lipovetzky et al. [2015], and have implemented p-IW(1) on top of it,' but does not provide an explicit statement about releasing their own code or a link to a repository.
Open Datasets	Yes	We tested p-IW(1) and IW(1) on 53 of the 55 different games considered by Bellemare et al. [2013]
Dataset Splits	No	The paper does not provide specific percentages or counts for training, validation, and test dataset splits.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'the implementation of IW(1) by Lipovetzky et al. [2015]' and implementing 'p-IW(1) on top of it,' but does not provide specific software names with version numbers for replication.
Experiment Setup	Yes	each action selection decision was given a lookahead budget of 150000 simulated frames (or, equivalently, 30000 search nodes), the lookahead depth was limited to 1500 frames, and the accumulated rewards were discounted as R(s0) = R(s) + γd(s)+1r(s, a) where s is the unique parent of s0, a is the respective action, and the discount factor2 was set to γ = 0.995. To reduce the variance, each game was played 30 times, with the reported results being averaged across these runs.