Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Authors: Fan Zhou, Jianing Wang, Xingdong Feng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance.
Researcher Affiliation Academia Fan Zhou, Jianing Wang, Xingdong Feng School of Statistics and Management Shanghai University of Finance and Economics zhoufan@mail.shufe.edu.cn;jianing.wang@163.sufe.edu.cn; feng.xingdong@mail.shufe.edu.cn
Pseudocode No No explicitly labeled pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets Yes We test our method on the full Atari-57 benchmark. We follow all the parameter settings of [5] and initialize the learning rate to be 5 × 10−5 at the training stage.
Dataset Splits No The paper mentions '200 million training frames' and 'testing scores' but does not specify the exact percentages or counts for training, validation, or test splits. It implicitly relies on the standard setup for Atari-57 without detailing it.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments were mentioned.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes We set the number of quantiles N to be 200 and evaluate both algorithms on 200 million training frames. We follow all the parameter settings of [5] and initialize the learning rate to be 5 × 10−5 at the training stage. For the exploration set-up, we set the bonus rate ct in (25) to be 50 p log t/t which decays with the training step t. For both algorithms, we set κ = 1 for the Huber quantile loss in (22) due to its smoothness.