Diverse Projection Ensembles for Distributional Reinforcement Learning
Authors: Moritz Akiya Zanger, Wendelin Boehmer, Matthijs T. J. Spaan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on the behavior suite benchmark and Viz Doom and find that diverse projection ensembles lead to significant performance improvements over existing methods on a variety of tasks with the most pronounced gains in directed exploration problems. |
| Researcher Affiliation | Academia | Moritz A. Zanger Wendelin Böhmer Matthijs T. J. Spaan Delft University of Technology, The Netherlands {m.a.zanger, j.w.bohmer, m.t.j.spaan}@tudelft.nl |
| Pseudocode | Yes | Algorithm 1 PE-DQN |
| Open Source Code | Yes | C51 requires us to define return ranges, which we defined manually and can be found in the online code repository. |
| Open Datasets | Yes | We evaluate our algorithm on the behavior suite (Osband et al., 2020), a benchmark collection of 468 environments, and a set of hard exploration problems in the visual domain Viz Doom (Kempka et al., 2016). |
| Dataset Splits | No | The paper uses reinforcement learning environments (bsuite, Viz Doom) rather than traditional datasets with explicit train/validation/test splits. While a subselection of environments was used for hyperparameter tuning, this does not constitute a dataset split as defined. |
| Hardware Specification | Yes | We deployed bsuite environments in 16 parallel jobs to be executed on 8 NVIDIA Tesla V100S 32GB GPUs, 16 Intel XEON E5-6248R 24C 3.0GHz CPUs, and 64GB of memory in total. |
| Software Dependencies | No | All algorithms use the Adam optimizer (Kingma and Ba, 2015). The hyperparameter search was conducted using Optuna (Akiba et al., 2019). |
| Experiment Setup | Yes | Our experiments are designed to provide us with a better understanding of how PE-DQN operates, in comparison to related algorithms as well as in relation to its algorithmic elements. To this end, we aimed to keep codebases and hyperparameters between all implementations equal up to algorithm-specific parameters, which we optimized with a grid search on a selected subsets of problems. Further details regarding the experimental design and implementations are provided in Appendix B. |