Adaptive Pairwise Weights for Temporal Credit Assignment
Authors: Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh9225-9232
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this empirical paper, we explore heuristics based on more general pairwise weightings... |
| Researcher Affiliation | Academia | 1University of Michigan 2University of Oxford |
| Pseudocode | No | The paper states 'An overview of the algorithm is in the appendix' but does not include any pseudocode or clearly labeled algorithm blocks in the provided text. |
| Open Source Code | No | The paper mentions 'We compare against TVT by using their published code', referring to a third-party's code, but does not provide an explicit statement or link for the open-source code for the methodology described in this paper. |
| Open Datasets | Yes | We evaluated Meta-PWTD and -PWR the Key-to-Door (Kt D) environment (Hung et al. 2019) that is an elaborate umbrella problem that was designed to show-off the TVT algorithm s ability to solve TCA. ... bsuite (Osband et al. 2019) and Atari (Bellemare et al. 2013), both standard RL benchmarks. |
| Dataset Splits | No | The paper mentions tuning hyperparameters and repeating runs with different random seeds but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware specifications (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper discusses various algorithms and environments but does not provide a reproducible description of ancillary software with specific version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | We tuned hyperparameters for each method on the mid-level configuration µ = 5, σ = 25 and kept them fixed for the other 8 configurations. Each method has a distinct set of parameters (e.g. outer-loop learning rates, λ). More details are in the appendix. |