reproducibilityindex.ai

Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark

Authors: Stefan O'Toole, Nir Lipovetzky, Miquel Ramirez, Adrian Pearce

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms 𝜋-IW(1), 𝜋-IW(1)+ and 𝜋-HIW(n, 1).
Researcher Affiliation	Academia	Stefan O Toole Computing and Information Systems University of Melbourne, Australia stefan@student.unimelb.edu.au Nir Lipovetzky Computing and Information Systems University of Melbourne, Australia nir.lipovetzky@unimelb.edu.au Miquel Ramirez Electrical and Electronic Engineering University of Melbourne, Australia miquel.ramirez@unimelb.edu.au Adrian R. Pearce Computing and Information Systems University of Melbourne, Australia adrianrp@unimelb.edu.au
Pseudocode	Yes	Algorithm 1: Overview of the RIW(1) Algorithm
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	The Atari-2600 games can be accessed through the Arcade Learning Environment (ALE) [1]
Dataset Splits	No	The paper describes its training budget and evaluation process but does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for a static dataset.
Hardware Specification	Yes	We ran 80 independent trials at once over 80 Intel Xeon 2.10GHz processors with 720GB of shared RAM, limiting each trial to run on a single v CPU.
Software Dependencies	No	The paper mentions using Neural Networks, but does not provide specific ancillary software details like library or solver names with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	Following previous width-based planning papers [6, 8, 10] we use a frameskip of 15. We keep our experimental settings the same as Junyent et al. [10] including a training budget of 2 × 10^7 simulator interactions and allowing 100 simulator interactions at each planning time step, which allows almost real-time planning.