Temporal Induced Self-Play for Stochastic Bayesian Games

Authors: Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.
Researcher Affiliation Academia 1Shanghai Jiao Tong University 2Shanghai Qi Zhi Institute 3Tsinghua University 4Carnegie Mellon University
Pseudocode Yes Algorithm 1 Temporal-Induced Self-Play; Algorithm 2 Compute Test-Time Strategy
Open Source Code No The paper does not provide any links to source code or explicitly state that the code for the methodology is being released or is available.
Open Datasets No The paper describes testing its algorithms in various "games" (Finitely Repeated Security Game, Exposing Game, Tagging Game) which are custom environments or adaptations, rather than standard, publicly available datasets with access information. No specific dataset links or formal citations (author, year) are provided for a publicly available dataset.
Dataset Splits No The paper does not specify explicit training, validation, and test dataset splits with percentages, sample counts, or references to predefined splits, as it focuses on training agents within game environments rather than on static datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions training times and sample counts.
Software Dependencies No The paper mentions using "deep reinforcement learning" but does not specify any software names with version numbers (e.g., Python 3.x, PyTorch x.x, CUDA x.x, specific libraries or solvers with versions).
Experiment Setup No The paper states "Full experiment details can be found in Appx. D." but this appendix is not provided in the main text. The main body of the paper does not contain specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations.