Planning and Learning with Adaptive Lookahead

Authors: Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.
Researcher Affiliation Collaboration Aviv Rosenberg1*, Assaf Hallak2, Shie Mannor2,3, Gal Chechik2,4, Gal Dalal2 1 Amazon Science, 2 Nvidia Research, 3 Technion, 4 Bar-Ilan University
Pseudocode Yes Algorithm 1: TLPI
Open Source Code No The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes We train QL-DQN on several Atari environments (Bellemare et al. 2013).
Dataset Splits No The paper references training and testing phases but does not explicitly provide details about specific train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits) for reproducibility.
Hardware Specification No The paper mentions 'efficient parallel Atari simulation on GPU' but does not provide specific hardware details such as GPU models, CPU types, or other hardware specifications used for running the experiments.
Software Dependencies No The paper does not explicitly provide specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') that would enable replication of the experimental environment.
Experiment Setup Yes In all our experiments we run QLPI with θ1 = 1 and θ3 = θ5 = θ6 = θ7 = 0 (again e V = V ). For (θ2, θ4, θ8) we set the following values: (0.3, 0.2, 0.1), (0.2, 0.15, 0.05), (0.2, 0.05, 0.02) and (0.1, 0.05, 0.02), which respectively depict decreasing weights to depths 2, 4, 8.