reproducibilityindex.ai

Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning

Authors: Liangpeng Zhang, Ke Tang, Xin Yao

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present our empirical results on the skewness of estimated values. There are two purposes in these experiments: (a) to demonstrate how substantial the harm of the skewness can be; (b) to see the improvement provided by collecting more observations, as mentioned in Section 4.1. We conducted experiments in chain MDPs shown in Figure 4.
Researcher Affiliation	Academia	Liangpeng Zhang1,2, Ke Tang3,1, and Xin Yao3,2 1School of Computer Science and Technology, University of Science and Technology of China 2University of Birmingham, U.K. 3Shenzhen Key Lab of Computational Intelligence, Department of Computer Science and Engineering, Southern University of Science and Technology, China
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	No explicit statement about providing open-source code or a link to a code repository is found in the paper.
Open Datasets	No	We conducted experiments in chain MDPs shown in Figure 4. ... We also conducted experiments in the complex maze domain [26] in the same manner as above. The maze used is given in Figure 6 (a).
Dataset Splits	No	In each run of experiment, m observations were collected for each state-action pair, resulting in a data set of size 2mn.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper discusses algorithms but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	The empirical and theoretical distributions of estimated state value ˆV π+(s1) with m = 200, n = 20, p = 0.1, r G = 1e6 in 1000 runs is shown in Figure 5 (a). ... under discount factor γ = 0.9. ... Figure 6 (b) shows the empirical distribution of estimated value ˆV π (sstart, no ﬂag) under γ = 0.9 and m = 10 in 1000 runs.