Planning and Learning with Adaptive Lookahead
Authors: Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari. |
| Researcher Affiliation | Collaboration | Aviv Rosenberg1*, Assaf Hallak2, Shie Mannor2,3, Gal Chechik2,4, Gal Dalal2 1 Amazon Science, 2 Nvidia Research, 3 Technion, 4 Bar-Ilan University |
| Pseudocode | Yes | Algorithm 1: TLPI |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | We train QL-DQN on several Atari environments (Bellemare et al. 2013). |
| Dataset Splits | No | The paper references training and testing phases but does not explicitly provide details about specific train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits) for reproducibility. |
| Hardware Specification | No | The paper mentions 'efficient parallel Atari simulation on GPU' but does not provide specific hardware details such as GPU models, CPU types, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly provide specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') that would enable replication of the experimental environment. |
| Experiment Setup | Yes | In all our experiments we run QLPI with θ1 = 1 and θ3 = θ5 = θ6 = θ7 = 0 (again e V = V ). For (θ2, θ4, θ8) we set the following values: (0.3, 0.2, 0.1), (0.2, 0.15, 0.05), (0.2, 0.05, 0.02) and (0.1, 0.05, 0.02), which respectively depict decreasing weights to depths 2, 4, 8. |