Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis
Authors: Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present numerical tests to showcase the convergence behaviors of the proposed DTDT as well as the classic DTDL algorithms [26, 11, 33]. The experimental setting is: number of agents N = 100, state-space |S| = 100, length of state |s| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd |s| independently drawn from the Gaussian distribution with zero mean and variance 0.01. Figure 1 shows that, with the same stepsize α = 0.3, both the DTDT and DTDL algorithms converge to the same fixed-point solution, but the proposed DTDT achieves a consensus error (disagreement) several orders-of-magnitude smaller than that of DTDT. From the size of TD-error viewpoint, our DTDT is also faster than DTDL. These observations are consistent with our analysis, corroborating that gradient tracking is able to track the average of the network s full gradient efficiently at local nodes. |
| Researcher Affiliation | Collaboration | 1University of Minnesota, Minneapolis, MN 55455, US; georgios@umn.edu 2IBM Research, Yorktown Heights, NY 10598, US; {songtao@ibm.com, gtesauro@us.ibm.com} 3Beijing Institute of Technology, Beijing 100081, China; sunjian@bit.edu.cn |
| Pseudocode | Yes | Our DTDT algorithm with linear function approximation is tabulated as Alg. 1 |
| Open Source Code | No | The paper does not provide any concrete access (e.g., repository link, explicit statement of release) to the source code for the methodology described. |
| Open Datasets | No | The experimental setting is: number of agents N = 100, state-space |S| = 100, length of state |s| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd |s| independently drawn from the Gaussian distribution with zero mean and variance 0.01. |
| Dataset Splits | No | The paper describes generating its own synthetic data and the parameters of the simulation environment but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper describes the experimental setting and parameters (number of agents, state-space, etc.) but does not specify any hardware used for running the experiments. |
| Software Dependencies | No | The paper describes the algorithms and numerical results but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | The experimental setting is: number of agents N = 100, state-space |S| = 100, length of state |s| = 10, problem dimension d = 10; the communication graph is connected and randomly generated by the Erd os Rényi model; while the transition matrix P is doubly stochastic, and symmetric. Rewards {r(s, a, s ) 0} were generated randomly, by following the uniform distribution over [0, 0.01]. Feature vectors were generated as φ(s) = cos(As), with entries of A Rd |s| independently drawn from the Gaussian distribution with zero mean and variance 0.01. Figure 1 shows that, with the same stepsize α = 0.3... |