Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
Authors: Stefan O'Toole, Nir Lipovetzky, Miquel Ramirez, Adrian Pearce
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms 𝜋-IW(1), 𝜋-IW(1)+ and 𝜋-HIW(n, 1). |
| Researcher Affiliation | Academia | Stefan O Toole Computing and Information Systems University of Melbourne, Australia stefan@student.unimelb.edu.au Nir Lipovetzky Computing and Information Systems University of Melbourne, Australia nir.lipovetzky@unimelb.edu.au Miquel Ramirez Electrical and Electronic Engineering University of Melbourne, Australia miquel.ramirez@unimelb.edu.au Adrian R. Pearce Computing and Information Systems University of Melbourne, Australia adrianrp@unimelb.edu.au |
| Pseudocode | Yes | Algorithm 1: Overview of the RIW(1) Algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | The Atari-2600 games can be accessed through the Arcade Learning Environment (ALE) [1] |
| Dataset Splits | No | The paper describes its training budget and evaluation process but does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for a static dataset. |
| Hardware Specification | Yes | We ran 80 independent trials at once over 80 Intel Xeon 2.10GHz processors with 720GB of shared RAM, limiting each trial to run on a single v CPU. |
| Software Dependencies | No | The paper mentions using Neural Networks, but does not provide specific ancillary software details like library or solver names with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Following previous width-based planning papers [6, 8, 10] we use a frameskip of 15. We keep our experimental settings the same as Junyent et al. [10] including a training budget of 2 × 10^7 simulator interactions and allowing 100 simulator interactions at each planning time step, which allows almost real-time planning. |