Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Planning and Learning with Adaptive Lookahead
Authors: Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari. |
| Researcher Affiliation | Collaboration | Aviv Rosenberg1*, Assaf Hallak2, Shie Mannor2,3, Gal Chechik2,4, Gal Dalal2 1 Amazon Science, 2 Nvidia Research, 3 Technion, 4 Bar-Ilan University |
| Pseudocode | Yes | Algorithm 1: TLPI |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | We train QL-DQN on several Atari environments (Bellemare et al. 2013). |
| Dataset Splits | No | The paper references training and testing phases but does not explicitly provide details about specific train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits) for reproducibility. |
| Hardware Specification | No | The paper mentions 'efficient parallel Atari simulation on GPU' but does not provide specific hardware details such as GPU models, CPU types, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly provide specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') that would enable replication of the experimental environment. |
| Experiment Setup | Yes | In all our experiments we run QLPI with θ1 = 1 and θ3 = θ5 = θ6 = θ7 = 0 (again e V = V ). For (θ2, θ4, θ8) we set the following values: (0.3, 0.2, 0.1), (0.2, 0.15, 0.05), (0.2, 0.05, 0.02) and (0.1, 0.05, 0.02), which respectively depict decreasing weights to depths 2, 4, 8. |