Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Regularized Opponent Model with Maximum Entropy Objective
Authors: Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines. |
| Researcher Affiliation | Academia | Zheng Tian1 , Ying Wen1 , Zhichen Gong1 , Faiz Punakkath1 , Shihao Zou2 and Jun Wang 1 1University College London 2University of Alberta |
| Pseudocode | Yes | We list the pseudo-code of ROMMEO-Q and ROMMEO-AC in Appendix A. |
| Open Source Code | Yes | The experiment code and appendix are available at https://github.com/rommeoijcai2019/rommeo. |
| Open Datasets | No | The paper uses environments like 'iterated matrix games' and 'differential Max of Two Quadratic Game' rather than pre-existing datasets with explicit public access information like links or formal citations. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments (iterated matrix games, differential games) where data is generated through interaction, and therefore explicit train/validation/test dataset splits are not described. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for replication. |
| Experiment Setup | No | The paper mentions general aspects of the experimental setup like the number of episodes and steps, and discusses exploration control, but does not provide specific numerical hyperparameters (e.g., learning rate, batch size, optimizer details) for replication. |