Representation Learning for Low-rank General-sum Markov Games

Authors: Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Zihan Ding, Chi Jin, Mengdi Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We accompany our theoretical result with a neural network-based implementation of our algorithm and evaluate it against the widely used deep RL baseline, DQN with fictitious play. In this section we investigate our algorithm with proof-of-concept empirical studies.
Researcher Affiliation Academia Chengzhuo Ni Princeton University cn10@princeton.edu Yuda Song Carnegie Mellon University yudas@andrew.cmu.edu Xuezhou Zhang Princeton University xz7392@princeton.edu Zihan Ding Princeton University zihand@princeton.edu Chi Jin Princeton University chij@princeton.edu Mengdi Wang Princeton University mengdiw@princeton.edu
Pseudocode Yes Algorithm 1 General Representation Learning for Multi-player General-sum Low-Rank Markov Game with UCB-driven Exploration (GERL_MG2) ... Algorithm 5 Model-free Representation Learning in Practice
Open Source Code Yes We also submit anonymous code in the supplemental materials.
Open Datasets No We design our Block Markov game by first randomly generating a tabular Markov game with horizon H, 3 states, 2 players each with 3 actions, and random reward matrix Rh (0, 1)3 32 H and random transition matrix Th(sh, ah) (Sh+1). For the generation of rich observation (emission distribution), we follow the experiment design of (Misra et al., 2020). The paper describes a custom-generated environment rather than using a publicly available dataset and does not provide access details for the generated instances.
Dataset Splits No The paper describes training and evaluating policies within a simulated Markov game environment, but it does not specify traditional train/validation/test dataset splits with percentages or counts, as data is collected dynamically through interaction with the environment rather than from a pre-defined static dataset.
Hardware Specification No The paper does not mention any specific hardware specifications (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No Specifically, we implement Algorithm. 3 with deep learning libraries (Paszke et al., 2017). The paper mentions 'deep learning libraries' and cites a paper (Paszke et al., 2017, which corresponds to PyTorch) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We include the hyperparameter for GERL_MG2 in Table. 2, and the hyperparameter for DQN in Table. 3 and Table. 4. These tables list specific values such as learning rate, batch size, and hidden layer sizes.