Representation Learning for Low-rank General-sum Markov Games
Authors: Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Zihan Ding, Chi Jin, Mengdi Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We accompany our theoretical result with a neural network-based implementation of our algorithm and evaluate it against the widely used deep RL baseline, DQN with fictitious play. In this section we investigate our algorithm with proof-of-concept empirical studies. |
| Researcher Affiliation | Academia | Chengzhuo Ni Princeton University cn10@princeton.edu Yuda Song Carnegie Mellon University yudas@andrew.cmu.edu Xuezhou Zhang Princeton University xz7392@princeton.edu Zihan Ding Princeton University zihand@princeton.edu Chi Jin Princeton University chij@princeton.edu Mengdi Wang Princeton University mengdiw@princeton.edu |
| Pseudocode | Yes | Algorithm 1 General Representation Learning for Multi-player General-sum Low-Rank Markov Game with UCB-driven Exploration (GERL_MG2) ... Algorithm 5 Model-free Representation Learning in Practice |
| Open Source Code | Yes | We also submit anonymous code in the supplemental materials. |
| Open Datasets | No | We design our Block Markov game by first randomly generating a tabular Markov game with horizon H, 3 states, 2 players each with 3 actions, and random reward matrix Rh (0, 1)3 32 H and random transition matrix Th(sh, ah) (Sh+1). For the generation of rich observation (emission distribution), we follow the experiment design of (Misra et al., 2020). The paper describes a custom-generated environment rather than using a publicly available dataset and does not provide access details for the generated instances. |
| Dataset Splits | No | The paper describes training and evaluating policies within a simulated Markov game environment, but it does not specify traditional train/validation/test dataset splits with percentages or counts, as data is collected dynamically through interaction with the environment rather than from a pre-defined static dataset. |
| Hardware Specification | No | The paper does not mention any specific hardware specifications (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | Specifically, we implement Algorithm. 3 with deep learning libraries (Paszke et al., 2017). The paper mentions 'deep learning libraries' and cites a paper (Paszke et al., 2017, which corresponds to PyTorch) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We include the hyperparameter for GERL_MG2 in Table. 2, and the hyperparameter for DQN in Table. 3 and Table. 4. These tables list specific values such as learning rate, batch size, and hidden layer sizes. |