Loop Estimator for Discounted Values in Markov Reward Processes

Authors: Falcon Z. Dai, Matthew R. Walter7169-7175

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Experiments We consider River Swim, an MDP proposed by Strehl and Littman (2008) that is often used to illustrate the challenge of exploration in RL. ... We compare the estimation errors measured in -norm, which is important in RL. The results are shown in Figure 2.
Researcher Affiliation Academia Falcon Z. Dai, Matthew R. Walter Toyota Technological Institute at Chicago Chicago, Illinois, USA 60637 {dai, mwalter}@ttic.edu
Pseudocode Yes Algorithm 1 Loop estimator (for a specific state)
Open Source Code Yes An implementation of the proposed loop estimator and presented experiments is publicly available.2 https://github.com/falcondai/loop-estimator
Open Datasets Yes We consider River Swim, an MDP proposed by Strehl and Littman (2008) that is often used to illustrate the challenge of exploration in RL.
Dataset Splits No The paper describes experiments on a 'single sample path' from the River Swim MDP but does not explicitly provide train/validation/test dataset splits with percentages, counts, or a detailed splitting methodology.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or specific computing environments) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, used to replicate the experiment.
Experiment Setup Yes The paper specifies values for the discount factor γ (e.g., 'γ = 0.9' and 'γ = 0.99'), sample path length 'T = 10^5', and learning rate parameters for TD(k) estimators (e.g., 'd = 1' and 'd = 1/2').