reproducibilityindex.ai

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Authors: Fan Zhou, Jianing Wang, Xingdong Feng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modiﬁcation can signiﬁcantly outperform their baselines in terms of faster convergence speeds and better testing performance.
Researcher Affiliation	Academia	Fan Zhou, Jianing Wang, Xingdong Feng School of Statistics and Management Shanghai University of Finance and Economics zhoufan@mail.shufe.edu.cn;jianing.wang@163.sufe.edu.cn; feng.xingdong@mail.shufe.edu.cn
Pseudocode	No	No explicitly labeled pseudocode or algorithm blocks were found.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	We test our method on the full Atari-57 benchmark. We follow all the parameter settings of [5] and initialize the learning rate to be 5 × 10−5 at the training stage.
Dataset Splits	No	The paper mentions '200 million training frames' and 'testing scores' but does not specify the exact percentages or counts for training, validation, or test splits. It implicitly relies on the standard setup for Atari-57 without detailing it.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We set the number of quantiles N to be 200 and evaluate both algorithms on 200 million training frames. We follow all the parameter settings of [5] and initialize the learning rate to be 5 × 10−5 at the training stage. For the exploration set-up, we set the bonus rate ct in (25) to be 50 p log t/t which decays with the training step t. For both algorithms, we set κ = 1 for the Huber quantile loss in (22) due to its smoothness.