Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Planning and Learning with Adaptive Lookahead

Authors: Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.
Researcher Affiliation Collaboration Aviv Rosenberg1*, Assaf Hallak2, Shie Mannor2,3, Gal Chechik2,4, Gal Dalal2 1 Amazon Science, 2 Nvidia Research, 3 Technion, 4 Bar-Ilan University
Pseudocode Yes Algorithm 1: TLPI
Open Source Code No The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes We train QL-DQN on several Atari environments (Bellemare et al. 2013).
Dataset Splits No The paper references training and testing phases but does not explicitly provide details about specific train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits) for reproducibility.
Hardware Specification No The paper mentions 'efficient parallel Atari simulation on GPU' but does not provide specific hardware details such as GPU models, CPU types, or other hardware specifications used for running the experiments.
Software Dependencies No The paper does not explicitly provide specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') that would enable replication of the experimental environment.
Experiment Setup Yes In all our experiments we run QLPI with θ1 = 1 and θ3 = θ5 = θ6 = θ7 = 0 (again e V = V ). For (θ2, θ4, θ8) we set the following values: (0.3, 0.2, 0.1), (0.2, 0.15, 0.05), (0.2, 0.05, 0.02) and (0.1, 0.05, 0.02), which respectively depict decreasing weights to depths 2, 4, 8.