Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
Authors: Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate experimentally the performance of AGPIQ on a simultaneous two-player game, namely Alesia.In this section, AGPI-Q is tested on the Alesia game described in Sec. 2.5 where we assume that both players start with a budget n = 20. As a baseline, we use the exact solution of the problem provided by VI. We have run the algorithm for K = 10 iterations and for m {1, 2, 3, 4, 5} evaluation steps. We have considered different sample set sizes, N = 2500, 5000, 10000. Each experiment is repeated 20 times. |
| Researcher Affiliation | Academia | (1)Univ. Lille, CRISt AL, Seque L team, France (2)Inria, Villers-l es-Nancy, F-54600, France (3)Institut Universitaire de France (IUF), France |
| Pseudocode | Yes | Algorithm 1 AGPI Q for Batch sample Input: ((xj, aj 1, aj 2), rj, x j)j=1,...,N some samples, q0 = 0 a Q-function, F an hypothesis space for k=1,2,...,K do Greedy step: for all j do aj = arg max a min b qk 1(x j, a, b) (solving a matrix game) end for Evaluation step: qk,0 = qk 1 for i=1,...,m do for all j do qj = r(xj, aj 1, aj 2) + γ minb qk,i 1(x j, aj, b) end for qk,i = arg minq F j=1 l(q(xj, aj 1, aj 2), qj) Where l is a loss function. qk = qk,m end for end for output q K |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | No | The paper describes the Alesia game and how samples were generated, but does not provide concrete access information (link, DOI, repository name, or formal citation with authors/year) for a publicly available or open dataset used in the experiments. |
| Dataset Splits | No | The paper describes generating N uniform samples and using them in a batch setting, but does not specify explicit training, validation, or test dataset splits for these samples. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using CART trees and linear programming techniques, but does not specify any software names with version numbers for replication. |
| Experiment Setup | Yes | We have run the algorithm for K = 10 iterations and for m {1, 2, 3, 4, 5} evaluation steps. We have considered different sample set sizes, N = 2500, 5000, 10000. Each experiment is repeated 20 times. |