Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

Authors: Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical tests to showcase the convergence behaviors of the proposed DTDT as well as the classic DTDL algorithms [26, 11, 33]. The experimental setting is: number of agents N = 100, state-space |S| = 100, length of state |s| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd |s| independently drawn from the Gaussian distribution with zero mean and variance 0.01. Figure 1 shows that, with the same stepsize α = 0.3, both the DTDT and DTDL algorithms converge to the same fixed-point solution, but the proposed DTDT achieves a consensus error (disagreement) several orders-of-magnitude smaller than that of DTDT. From the size of TD-error viewpoint, our DTDT is also faster than DTDL. These observations are consistent with our analysis, corroborating that gradient tracking is able to track the average of the network s full gradient efficiently at local nodes.
Researcher Affiliation Collaboration 1University of Minnesota, Minneapolis, MN 55455, US; georgios@umn.edu 2IBM Research, Yorktown Heights, NY 10598, US; {songtao@ibm.com, gtesauro@us.ibm.com} 3Beijing Institute of Technology, Beijing 100081, China; sunjian@bit.edu.cn
Pseudocode Yes Our DTDT algorithm with linear function approximation is tabulated as Alg. 1
Open Source Code No The paper does not provide any concrete access (e.g., repository link, explicit statement of release) to the source code for the methodology described.
Open Datasets No The experimental setting is: number of agents N = 100, state-space |S| = 100, length of state |s| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd |s| independently drawn from the Gaussian distribution with zero mean and variance 0.01.
Dataset Splits No The paper describes generating its own synthetic data and the parameters of the simulation environment but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper describes the experimental setting and parameters (number of agents, state-space, etc.) but does not specify any hardware used for running the experiments.
Software Dependencies No The paper describes the algorithms and numerical results but does not list specific software dependencies with version numbers.
Experiment Setup Yes The experimental setting is: number of agents N = 100, state-space |S| = 100, length of state |s| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd |s| independently drawn from the Gaussian distribution with zero mean and variance 0.01. Figure 1 shows that, with the same stepsize α = 0.3...