A Regularized Opponent Model with Maximum Entropy Objective
Authors: Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines. |
| Researcher Affiliation | Academia | Zheng Tian1 , Ying Wen1 , Zhichen Gong1 , Faiz Punakkath1 , Shihao Zou2 and Jun Wang 1 1University College London 2University of Alberta |
| Pseudocode | Yes | We list the pseudo-code of ROMMEO-Q and ROMMEO-AC in Appendix A. |
| Open Source Code | Yes | The experiment code and appendix are available at https://github.com/rommeoijcai2019/rommeo. |
| Open Datasets | No | The paper uses environments like 'iterated matrix games' and 'differential Max of Two Quadratic Game' rather than pre-existing datasets with explicit public access information like links or formal citations. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments (iterated matrix games, differential games) where data is generated through interaction, and therefore explicit train/validation/test dataset splits are not described. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for replication. |
| Experiment Setup | No | The paper mentions general aspects of the experimental setup like the number of episodes and steps, and discusses exploration control, but does not provide specific numerical hyperparameters (e.g., learning rate, batch size, optimizer details) for replication. |