Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning
Authors: Sheng Zhang, Zhe Zhang, Siva Theja Maguluri
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present empirical results of the average-reward TD(λ) with linear function approximation (i.e. Algorithm 1). In our simulation, we consider a randomly generated MRP with |S| = 100 states and a randomly generated feature matrix Φ with d = 20 features and e WΦ. Experimental details and figures are provided in Appendix C. |
| Researcher Affiliation | Academia | Sheng Zhang Zhe Zhang Siva Theja Maguluri The H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology {shengzhang, jimmy_zhang, siva.theja}@gatech.edu |
| Pseudocode | Yes | Algorithm 1: TD(λ) with linear function approximation; Algorithm 2: J-step Synchronous Q-learning |
| Open Source Code | Yes | All the implementations are publicly availablez. zhttps://github.com/xiaojianzhang/Average-Reward-TD-Q-Learning |
| Open Datasets | No | In our simulation, we consider a randomly generated MRP with |S| = 100 states and a randomly generated feature matrix Φ with d = 20 features and e WΦ. |
| Dataset Splits | No | The paper describes generating data from a Markov chain and studying algorithm convergence, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for supervised learning. |
| Hardware Specification | No | The paper discusses numerical experiments and simulations but does not specify any hardware details (e.g., CPU, GPU models, memory) used for these simulations. |
| Software Dependencies | No | The paper provides a link to a GitHub repository for implementations but does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | No | The paper mentions general parameters for the randomly generated MRP and conditions for step sizes (e.g., 'properly chosen diminishing step sizes') but does not provide specific numerical values for hyperparameters or other training configurations (e.g., exact step size values, number of iterations used in simulations) in the main text. It states 'Experimental details and figures are provided in Appendix C', implying specific setup details are outside the provided text. |