Distributional Reinforcement Learning with Monotonic Splines

Authors: Yudong Luo, Guiliang Liu, Haonan Duan, Oliver Schulte, Pascal Poupart

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in stochastic environments show that a dense estimation for quantile functions enhances distributional RL in terms of faster empirical convergence and higher rewards in most cases.
Researcher Affiliation Academia Yudong Luo1,4, Guiliang Liu1,4 , Haonan Duan2,4, Oliver Schulte3, Pascal Poupart1,4 1University of Waterloo, 2University of Toronto, 3Simon Fraser University, 4Vector Institute
Pseudocode Yes Algorithm 1 DDPG with QR-based distributional critic (apart from FQF)... Algorithm 6 SAC with QR-based distributional critic (MM)
Open Source Code Yes The code for the main experiments is released in the supplementary material.
Open Datasets Yes Hence, in this work, we modify several robotics environments by adding stochasticity, including one discrete environment from Open AI Gym (Brockman et al., 2016) and nine continuous environments from Py Bullet Gym (Ellenberger, 2018 2019).
Dataset Splits No The paper describes training frames/episodes and testing procedures, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts in the way a classification or regression paper would for a static dataset.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions software components like 'DDPG', 'SAC', 'RMSProp', but does not provide specific version numbers for any software or libraries.
Experiment Setup Yes Table 1: Common hyperparameters for SPL-DQN, NC-QR-DQN, NDQFN, and QR-DQN. Table 2: Common hyperparameters across SPL-DQN, QR-DQN, IQN, FQF, NC-QR-DQN, MM-DQN, and NDQFN. Table 3: Noise settings for different environments in Py Bullet Gym. Table 4: Hyperparameters for DDPG and DDPG based methods. Table 5: Hyperparameters for SAC and SAC based methods.