reproducibilityindex.ai

Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Authors: Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical tests to showcase the convergence behaviors of the proposed DTDT as well as the classic DTDL algorithms [26, 11, 33]. The experimental setting is: number of agents N = 100, state-space \|S\| = 100, length of state \|s\| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd \|s\| independently drawn from the Gaussian distribution with zero mean and variance 0.01. Figure 1 shows that, with the same stepsize α = 0.3, both the DTDT and DTDL algorithms converge to the same ﬁxed-point solution, but the proposed DTDT achieves a consensus error (disagreement) several orders-of-magnitude smaller than that of DTDT. From the size of TD-error viewpoint, our DTDT is also faster than DTDL. These observations are consistent with our analysis, corroborating that gradient tracking is able to track the average of the network s full gradient efﬁciently at local nodes.
Researcher Affiliation	Collaboration	1University of Minnesota, Minneapolis, MN 55455, US; georgios@umn.edu 2IBM Research, Yorktown Heights, NY 10598, US; {songtao@ibm.com, gtesauro@us.ibm.com} 3Beijing Institute of Technology, Beijing 100081, China; sunjian@bit.edu.cn
Pseudocode	Yes	Our DTDT algorithm with linear function approximation is tabulated as Alg. 1
Open Source Code	No	The paper does not provide any concrete access (e.g., repository link, explicit statement of release) to the source code for the methodology described.
Open Datasets	No	The experimental setting is: number of agents N = 100, state-space \|S\| = 100, length of state \|s\| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd \|s\| independently drawn from the Gaussian distribution with zero mean and variance 0.01.
Dataset Splits	No	The paper describes generating its own synthetic data and the parameters of the simulation environment but does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper describes the experimental setting and parameters (number of agents, state-space, etc.) but does not specify any hardware used for running the experiments.
Software Dependencies	No	The paper describes the algorithms and numerical results but does not list specific software dependencies with version numbers.
Experiment Setup	Yes	The experimental setting is: number of agents N = 100, state-space \|S\| = 100, length of state \|s\| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd \|s\| independently drawn from the Gaussian distribution with zero mean and variance 0.01. Figure 1 shows that, with the same stepsize α = 0.3...