reproducibilityindex.ai

Foundations of Multivariate Distributional Reinforcement Learning

Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Mark Rowland

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, with the aid of our technical results and simulations, we identify tradeoffs between distribution representations that influence the performance of multivariate distributional RL in practice. and 6.1 Simulations: Distributional Successor Features
Researcher Affiliation	Collaboration	Harley Wiltzer Mila Québec AI Institute Mc Gill University harley.wiltzer@mail.mcgill.ca Jesse Farebrother Mila Québec AI Institute Mc Gill Unversity jfarebro@cs.mcgill.ca Arthur Gretton Google Deep Mind Gatsby Unit, University College London gretton@google.com Mark Rowland Google Deep Mind markrowland@google.com
Pseudocode	Yes	Algorithm 1 Projected Categorical Dynamic Programming
Open Source Code	No	The NeurIPS Paper Checklist states 'Code will be provided.', which is a future promise, not a current release of the code for the work described in the paper.
Open Datasets	No	The paper describes using '100 random MDPs, with transitions drawn from Dirichlet priors and 2-dimensional cumulants drawn from uniform priors.' This indicates custom-generated data rather than a specific, named, publicly available dataset with a concrete access link or formal citation.
Dataset Splits	No	The paper does not explicitly provide details about training/test/validation dataset splits, nor does it reference predefined splits or cross-validation setups for the MDP data used in experiments.
Hardware Specification	Yes	TD-learning experiments were conducted on a NVidia A100 80G GPU to parallelize experiments.
Software Dependencies	No	The paper mentions software like 'Jax [BFH+18]' and 'Jax Opt [BBC+21]' and the 'Julia programming language [BEKS17]', but it does not provide specific version numbers for these software components (e.g., 'Jax 0.x' or 'Julia 1.x').
Experiment Setup	Yes	SGD was used for optimization, using an annealed learning rate schedule (λk)k 0 with λk = k 3/5, satisfying the conditions of Lemma 10.