Classical Planning with Simulators: Results on the Atari Video Games
Authors: Nir Lipovetzky, Miquel Ramirez, Hector Geffner
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results over 54 Atari games show that the simplest such algorithm performs at the level of UCT, the state-of-the-art planning method in this domain, and suggest the potential of width-based methods for planning with simulators when factored, compact action models are not available. |
| Researcher Affiliation | Academia | Nir Lipovetzky The University of Melbourne Melbourne, Australia nir.lipovetzky@unimelb.edu.au Miquel Ramirez Australian National University Canberra, Australia miquel.ramirez@anu.edu.au Hector Geffner ICREA & U. Pompeu Fabra Barcelona, Spain urlhector.geffner@upf.edu |
| Pseudocode | No | The paper describes the Iterated Width (IW) algorithm and its variations in detail using text, but it does not include a formally labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing its source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We tested IW(1) and 2BFS over 54 of the 55 different games considered in [Bellemare et al., 2013], from now on abbreviated as BNVB. The Arcade Learning Environment (ALE) provides a challenging platform for evaluating general, domain-independent AI planners and learners through a convenient interface to hundreds of Atari 2600 games [Bellemare et al., 2013]. |
| Dataset Splits | No | The paper describes the experimental setup in terms of budget and frames for lookahead search within the Atari games environment. However, it does not specify any training, validation, or test dataset splits in the traditional machine learning sense, as the evaluation is done directly on the game environments. |
| Hardware Specification | Yes | Experiments were run on a cluster, where each computing node consists of a 6-core Intel Xeon E5-2440, with 2.4 GHz clock speed, with 64 GBytes of RAM installed. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, libraries, frameworks). It mentions using the Arcade Learning Environment, but without version details for the software stack. |
| Experiment Setup | Yes | For the experiments below, we added two simple variations to IW(1) and 2BFS. First, in the breadth-first search underlying IW(1), we generate the children in random order. Second, a discount factor γ = 0.995 is used in both algorithms for discounting future rewards like in UCT. Our experimental setup follows theirs except that a maximum budget of 150, 000 simulated frames is applied to IW(1), 2BFS, and UCT. IW(1) and 2BFS are limited to search up to a depth of 1, 500 frames and up to 150, 000 frames per root branch. |