Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Authors: Mark Rowland, Kevin Li, Remi Munos, Clare Lyle, Yunhao Tang, Will Dabney
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide an experimental study comparing a variety of model-based distributional RL algorithms, with several key takeaways for practitioners. |
| Researcher Affiliation | Industry | Mark Rowland Google Deep Mind Li Kevin Wenliang Google Deep Mind Rémi Munos FAIR, Meta Clare Lyle Google Deep Mind Yunhao Tang Google Deep Mind Will Dabney Google Deep Mind |
| Pseudocode | Yes | Algorithm 1: The direct categorical fixed-point algorithm (DCFP). |
| Open Source Code | No | The paper provides code snippets in Appendix G.5, which are marked with a copyright and license, but does not provide a direct link to an external repository for the full methodology described in the paper. |
| Open Datasets | No | The paper describes using a 'generative model' to obtain N i.i.d. samples for each state, which is a method of data generation rather than the use of a pre-existing, publicly available dataset. |
| Dataset Splits | No | The paper does not explicitly mention training/validation/test splits as it operates with a generative model for data sampling and evaluates on constructed environments. |
| Hardware Specification | No | All experiments were run using the Python 3 language (Van Rossum and Drake, 2009)... As all experiments are tabular, each run uses a single CPU, and timings are reported within the experimental results. The paper does not specify a particular CPU model or other specific hardware details. |
| Software Dependencies | No | All experiments were run using the Python 3 language (Van Rossum and Drake, 2009), and made use of Num Py (Harris et al., 2020), Sci Py (Virtanen et al., 2020), Matplotlib (Hunter, 2007), and Seaborn (Waskom, 2021) libraries. The paper provides publication years for the libraries but not explicit version numbers (e.g., NumPy 1.x or SciPy 1.x). |
| Experiment Setup | Yes | For each setting, we repeat the experiment 30 times with different sampled transitions. We display trade-off plots... for the cases of m {30, 100, 300, 1000} atoms and using N = 10^6 sample transitions from each state to estimate transition matrix. We ran all DP methods with 30,000 iterations. |