Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Authors: Taehyun Cho, Seungyub Han, Heesoo Lee, Kyungjae Lee, Jungwoo Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
Researcher Affiliation Collaboration Taehyun Cho1, Seungyub Han1,3, Heesoo Lee1, Kyungjae Lee2, Jungwoo Lee1 1 Seoul National University, 2 Chung-Ang University, 3 Hodoo AI Labs
Pseudocode Yes Algorithm 1 Perturbed QR-DQN (PQR)
Open Source Code No The paper references third-party codebases like DQN Zoo and Dopamine for comparisons, but does not provide an explicit statement or link for the open-source code of their own proposed method (PQR).
Open Datasets Yes Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
Dataset Splits No The paper mentions using standard benchmark environments like Atari and N-Chain but does not explicitly provide training, validation, and test split percentages or sample counts within the text.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are provided in the paper.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Table 2: Table of hyperparameter setting