Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning

Authors: Sheng Zhang, Zhe Zhang, Siva Theja Maguluri

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present empirical results of the average-reward TD(λ) with linear function approximation (i.e. Algorithm 1). In our simulation, we consider a randomly generated MRP with |S| = 100 states and a randomly generated feature matrix Φ with d = 20 features and e WΦ. Experimental details and figures are provided in Appendix C.
Researcher Affiliation Academia Sheng Zhang Zhe Zhang Siva Theja Maguluri The H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology {shengzhang, jimmy_zhang, siva.theja}@gatech.edu
Pseudocode Yes Algorithm 1: TD(λ) with linear function approximation; Algorithm 2: J-step Synchronous Q-learning
Open Source Code Yes All the implementations are publicly availablez. zhttps://github.com/xiaojianzhang/Average-Reward-TD-Q-Learning
Open Datasets No In our simulation, we consider a randomly generated MRP with |S| = 100 states and a randomly generated feature matrix Φ with d = 20 features and e WΦ.
Dataset Splits No The paper describes generating data from a Markov chain and studying algorithm convergence, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for supervised learning.
Hardware Specification No The paper discusses numerical experiments and simulations but does not specify any hardware details (e.g., CPU, GPU models, memory) used for these simulations.
Software Dependencies No The paper provides a link to a GitHub repository for implementations but does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup No The paper mentions general parameters for the randomly generated MRP and conditions for step sizes (e.g., 'properly chosen diminishing step sizes') but does not provide specific numerical values for hyperparameters or other training configurations (e.g., exact step size values, number of iterations used in simulations) in the main text. It states 'Experimental details and figures are provided in Appendix C', implying specific setup details are outside the provided text.