Influence-Based Multi-Agent Exploration
Authors: Tonghan Wang*, Jianhao Wang*, Yi Wu, Chongjie Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate the significant strength of our methods in a variety of multi-agent scenarios. |
| Researcher Affiliation | Collaboration | Tonghan Wang , Jianhao Wang , Yi Wu & Chongjie Zhang Institute for Interdisciplinary Information Sciences Tsinghua University Beijing, China wangth18@mails.tsinghua.edu.cn, wjh720.eric@gmail.com jxwuyi@openai.com, chongjie@tsinghua.edu.cn |
| Pseudocode | No | The paper presents mathematical derivations and equations, but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states 'The video of experiments is available at https://sites. google.com/view/influence-based-mae/' and refers to 'Open AI implementation of PPO2' for its framework, but it does not provide an explicit link or statement about releasing its own source code for the described methodology. |
| Open Datasets | No | The paper describes custom multi-agent tasks (Pass, Secret-Room, Push-Box, Island, Large-Island) within a 'discrete version of multi-agent particle world environment' and details their setup in Appendix D, but it does not provide explicit access information (link, DOI, formal citation) to these environments or datasets for public access. |
| Dataset Splits | No | The paper describes experimental setup and evaluation procedures but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files). |
| Hardware Specification | Yes | We train our models on an NVIDIA RTX 2080TI GPU using experience sampled from 32 parallel environments. |
| Software Dependencies | No | The paper mentions using 'Open AI implementation of PPO2' and 'Adam optimizer' but does not specify version numbers for these or any other software libraries or dependencies, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | Yes | For all our methods and baselines, we use η/ p N(s) as the exploration bonus for N(s)-th visit to state s. Specific values of η and scaling weights can be found in Table 2. ... We use Adam optimizer (Kingma & Ba, 2014) with learning rate 1 10-3 and batchsize 2048. ... The horizon of one episode is set to 300 timesteps in all these tasks. |