Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
Authors: Stefan O'Toole, Nir Lipovetzky, Miquel Ramirez, Adrian Pearce
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms 𝜋-IW(1), 𝜋-IW(1)+ and 𝜋-HIW(n, 1). |
| Researcher Affiliation | Academia | Stefan O Toole Computing and Information Systems University of Melbourne, Australia EMAIL Nir Lipovetzky Computing and Information Systems University of Melbourne, Australia EMAIL Miquel Ramirez Electrical and Electronic Engineering University of Melbourne, Australia EMAIL Adrian R. Pearce Computing and Information Systems University of Melbourne, Australia EMAIL |
| Pseudocode | Yes | Algorithm 1: Overview of the RIW(1) Algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | The Atari-2600 games can be accessed through the Arcade Learning Environment (ALE) [1] |
| Dataset Splits | No | The paper describes its training budget and evaluation process but does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for a static dataset. |
| Hardware Specification | Yes | We ran 80 independent trials at once over 80 Intel Xeon 2.10GHz processors with 720GB of shared RAM, limiting each trial to run on a single v CPU. |
| Software Dependencies | No | The paper mentions using Neural Networks, but does not provide specific ancillary software details like library or solver names with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Following previous width-based planning papers [6, 8, 10] we use a frameskip of 15. We keep our experimental settings the same as Junyent et al. [10] including a training budget of 2 × 10^7 simulator interactions and allowing 100 simulator interactions at each planning time step, which allows almost real-time planning. |