reproducibilityindex.ai

Policy Evaluation for Variance in Average Reward Reinforcement Learning

Authors: Shubhada Agrawal, Prashanth L A, Siva Theja Maguluri

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We design a temporal-difference (TD) type algorithm tailored for policy evaluation in this context. Our algorithm is based on linear stochastic approximation of an equivalent formulation of the asymptotic variance in terms of the solution of the Poisson equation. We consider both the tabular and linear function approximation settings, and establish O(1/k) finite time convergence rate, where k is the number of steps of the algorithm. and We develop the first finite sample error bounds for the policy evaluation problem for asymptotic variance in a tabular setting, proving O(1/k) rate of convergence for the mean-squared error, where k is the time step. Here, O( ) notation hides log k and lower order dependencies.
Researcher Affiliation	Academia	1H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, USA. 2Department of Computer Science and Engineering, Indian Institute of Technology Madras, India. Correspondence to: Shubhada Agrawal <sagrawal362@gatech.edu>.
Pseudocode	Yes	Algorithm 1: Policy Evaluation: Tabular Setting
Open Source Code	No	The paper does not include any statement or link providing concrete access to the source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not describe experiments performed on a specific public dataset. Therefore, no information about concrete access to a publicly available dataset is provided.
Dataset Splits	No	The paper is theoretical and does not involve empirical evaluation with datasets, thus no information on training/validation/test splits is provided.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments, as it is a theoretical work without empirical evaluation.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers, as it focuses on theoretical algorithm design and analysis.
Experiment Setup	No	The paper does not provide specific experimental setup details such as concrete hyperparameter values or training configurations, as it is a theoretical paper without empirical experiments.