Influencing Long-Term Behavior in Multiagent Reinforcement Learning

Authors: Dong-Ki Kim, Matthew Riemer, Miao Liu, Jakob Foerster, Michael Everett, Chuangchuang Sun, Gerald Tesauro, Jonathan P. How

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluation of our approach (Section 4). We demonstrate that our method consistently converges to a more desirable limiting distribution than baseline methods that either neglect the learning of others [14] or consider their learning with a myopic perspective [8] in various multiagent benchmark domains.
Researcher Affiliation Collaboration 1MIT-LIDS 2IBM-Research 3MIT-IBM Watson AI Lab 4Mila 5University of Oxford
Pseudocode Yes We provide further details, including implementation for k>1 and psuedocode, in Appendix E.
Open Source Code Yes The code is available at https: //bit.ly/3f XAr Ao, and video highlights are available at https://bit.ly/37IWeb9.
Open Datasets No The paper mentions benchmark domains like Bach/Stravinsky, Coordination, Matching Pennies, Mu Jo Co Robo Sumo, and MAgent Battle. However, it does not provide concrete access information (links, DOIs, specific repository names, or formal citations with author and year) for any publicly available datasets used for training within these environments.
Dataset Splits Yes 3. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix G.
Hardware Specification Yes All experiments are run on a machine with 32 Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz cores and 8 NVIDIA Tesla V100 32GB GPUs.
Software Dependencies No The paper refers to general frameworks and algorithms like 'soft actor-critic [27]' and 'variational inference [28]' but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, scikit-learn, etc.) needed for reproduction.
Experiment Setup Yes We refer to Appendix G for experimental details and hyperparameters. (...) 3. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix G.