Loop Estimator for Discounted Values in Markov Reward Processes
Authors: Falcon Z. Dai, Matthew R. Walter7169-7175
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiments We consider River Swim, an MDP proposed by Strehl and Littman (2008) that is often used to illustrate the challenge of exploration in RL. ... We compare the estimation errors measured in -norm, which is important in RL. The results are shown in Figure 2. |
| Researcher Affiliation | Academia | Falcon Z. Dai, Matthew R. Walter Toyota Technological Institute at Chicago Chicago, Illinois, USA 60637 {dai, mwalter}@ttic.edu |
| Pseudocode | Yes | Algorithm 1 Loop estimator (for a specific state) |
| Open Source Code | Yes | An implementation of the proposed loop estimator and presented experiments is publicly available.2 https://github.com/falcondai/loop-estimator |
| Open Datasets | Yes | We consider River Swim, an MDP proposed by Strehl and Littman (2008) that is often used to illustrate the challenge of exploration in RL. |
| Dataset Splits | No | The paper describes experiments on a 'single sample path' from the River Swim MDP but does not explicitly provide train/validation/test dataset splits with percentages, counts, or a detailed splitting methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or specific computing environments) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, used to replicate the experiment. |
| Experiment Setup | Yes | The paper specifies values for the discount factor γ (e.g., 'γ = 0.9' and 'γ = 0.99'), sample path length 'T = 10^5', and learning rate parameters for TD(k) estimators (e.g., 'd = 1' and 'd = 1/2'). |