Quantile Credit Assignment
Authors: Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Remi Munos
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show theoretically that this approach gives an unbiased policy gradient estimator that can yield significant variance reductions over a standard value estimate baseline. QCA and HQCA significantly outperform prior state-of-the-art methods on a range of extremely difficult credit assignment problems. |
| Researcher Affiliation | Collaboration | 1Deep Mind 2Harvard University 3Ecole polytechnique. |
| Pseudocode | Yes | Figure 1. Architecture and pseudocode of the QCA algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes custom-built environments (Key-To-Door variants, Combinatorial RL) but does not provide access information (URL, DOI, repository) for pre-existing datasets. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) as it operates on simulated environments generating trajectories. |
| Hardware Specification | No | The paper describes the neural network architectures and training process but does not specify any particular hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimization algorithms (RMSprop) and loss functions but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For High-Variance Key-To-Door, the optimal hyperparameters found for each algorithm can be found in Table 1. |