Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning
Authors: Liangpeng Zhang, Ke Tang, Xin Yao
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present our empirical results on the skewness of estimated values. There are two purposes in these experiments: (a) to demonstrate how substantial the harm of the skewness can be; (b) to see the improvement provided by collecting more observations, as mentioned in Section 4.1. We conducted experiments in chain MDPs shown in Figure 4. |
| Researcher Affiliation | Academia | Liangpeng Zhang1,2, Ke Tang3,1, and Xin Yao3,2 1School of Computer Science and Technology, University of Science and Technology of China 2University of Birmingham, U.K. 3Shenzhen Key Lab of Computational Intelligence, Department of Computer Science and Engineering, Southern University of Science and Technology, China |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository is found in the paper. |
| Open Datasets | No | We conducted experiments in chain MDPs shown in Figure 4. ... We also conducted experiments in the complex maze domain [26] in the same manner as above. The maze used is given in Figure 6 (a). |
| Dataset Splits | No | In each run of experiment, m observations were collected for each state-action pair, resulting in a data set of size 2mn. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | The paper discusses algorithms but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | The empirical and theoretical distributions of estimated state value ˆV π+(s1) with m = 200, n = 20, p = 0.1, r G = 1e6 in 1000 runs is shown in Figure 5 (a). ... under discount factor γ = 0.9. ... Figure 6 (b) shows the empirical distribution of estimated value ˆV π (sstart, no flag) under γ = 0.9 and m = 10 in 1000 runs. |