Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

Authors: Kaiqing Zhang, Zhuoran Yang, Tamer Basar

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms.
Researcher Affiliation Academia Kaiqing Zhang ECE and CSL University of Illinois at Urbana-Champaign kzhang66@illinois.edu Zhuoran Yang ORFE Princeton University zy6@princeton.edu Tamer Ba sar ECE and CSL University of Illinois at Urbana-Champaign basar1@illinois.edu
Pseudocode No The paper describes algorithms using mathematical equations and textual explanations, but does not present them in a structured pseudocode block or algorithm environment.
Open Source Code No No explicit statement or link providing concrete access to source code for the methodology described in this paper was found.
Open Datasets No The paper describes two simulation settings (Case 1 and Case 2) with specific matrix parameters, which are "created based on the simulations in [35]". It does not refer to a publicly available dataset with concrete access information (link, DOI, formal citation for the dataset itself).
Dataset Splits No The paper describes simulation settings with defined system parameters but does not provide specific train/validation/test dataset splits or methodologies for data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud resources with specs) used for running its simulations or experiments.
Software Dependencies No The paper does not specify any software dependencies (e.g., programming languages, libraries, or solvers with version numbers) used for its implementation or simulations.
Experiment Setup No While the paper defines system parameters for its simulations (matrices A, B, C, Q, Ru, Rv, Σ0), it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations for the experimental setup presented in Section 6.