reproducibilityindex.ai

Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning

Authors: Sheng Zhang, Zhe Zhang, Siva Theja Maguluri

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we present empirical results of the average-reward TD(λ) with linear function approximation (i.e. Algorithm 1). In our simulation, we consider a randomly generated MRP with \|S\| = 100 states and a randomly generated feature matrix Φ with d = 20 features and e WΦ. Experimental details and ﬁgures are provided in Appendix C.
Researcher Affiliation	Academia	Sheng Zhang Zhe Zhang Siva Theja Maguluri The H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology {shengzhang, jimmy_zhang, siva.theja}@gatech.edu
Pseudocode	Yes	Algorithm 1: TD(λ) with linear function approximation; Algorithm 2: J-step Synchronous Q-learning
Open Source Code	Yes	All the implementations are publicly availablez. zhttps://github.com/xiaojianzhang/Average-Reward-TD-Q-Learning
Open Datasets	No	In our simulation, we consider a randomly generated MRP with \|S\| = 100 states and a randomly generated feature matrix Φ with d = 20 features and e WΦ.
Dataset Splits	No	The paper describes generating data from a Markov chain and studying algorithm convergence, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for supervised learning.
Hardware Specification	No	The paper discusses numerical experiments and simulations but does not specify any hardware details (e.g., CPU, GPU models, memory) used for these simulations.
Software Dependencies	No	The paper provides a link to a GitHub repository for implementations but does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	No	The paper mentions general parameters for the randomly generated MRP and conditions for step sizes (e.g., 'properly chosen diminishing step sizes') but does not provide specific numerical values for hyperparameters or other training configurations (e.g., exact step size values, number of iterations used in simulations) in the main text. It states 'Experimental details and ﬁgures are provided in Appendix C', implying specific setup details are outside the provided text.