reproducibilityindex.ai

Loop Estimator for Discounted Values in Markov Reward Processes

Authors: Falcon Z. Dai, Matthew R. Walter7169-7175

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Numerical Experiments We consider River Swim, an MDP proposed by Strehl and Littman (2008) that is often used to illustrate the challenge of exploration in RL. ... We compare the estimation errors measured in -norm, which is important in RL. The results are shown in Figure 2.
Researcher Affiliation	Academia	Falcon Z. Dai, Matthew R. Walter Toyota Technological Institute at Chicago Chicago, Illinois, USA 60637 {dai, mwalter}@ttic.edu
Pseudocode	Yes	Algorithm 1 Loop estimator (for a speciﬁc state)
Open Source Code	Yes	An implementation of the proposed loop estimator and presented experiments is publicly available.2 https://github.com/falcondai/loop-estimator
Open Datasets	Yes	We consider River Swim, an MDP proposed by Strehl and Littman (2008) that is often used to illustrate the challenge of exploration in RL.
Dataset Splits	No	The paper describes experiments on a 'single sample path' from the River Swim MDP but does not explicitly provide train/validation/test dataset splits with percentages, counts, or a detailed splitting methodology.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or specific computing environments) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, used to replicate the experiment.
Experiment Setup	Yes	The paper specifies values for the discount factor γ (e.g., 'γ = 0.9' and 'γ = 0.99'), sample path length 'T = 10^5', and learning rate parameters for TD(k) estimators (e.g., 'd = 1' and 'd = 1/2').