Influence-Based Multi-Agent Exploration

Authors: Tonghan Wang*, Jianhao Wang*, Yi Wu, Chongjie Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically demonstrate the significant strength of our methods in a variety of multi-agent scenarios.
Researcher Affiliation Collaboration Tonghan Wang , Jianhao Wang , Yi Wu & Chongjie Zhang Institute for Interdisciplinary Information Sciences Tsinghua University Beijing, China wangth18@mails.tsinghua.edu.cn, wjh720.eric@gmail.com jxwuyi@openai.com, chongjie@tsinghua.edu.cn
Pseudocode No The paper presents mathematical derivations and equations, but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states 'The video of experiments is available at https://sites. google.com/view/influence-based-mae/' and refers to 'Open AI implementation of PPO2' for its framework, but it does not provide an explicit link or statement about releasing its own source code for the described methodology.
Open Datasets No The paper describes custom multi-agent tasks (Pass, Secret-Room, Push-Box, Island, Large-Island) within a 'discrete version of multi-agent particle world environment' and details their setup in Appendix D, but it does not provide explicit access information (link, DOI, formal citation) to these environments or datasets for public access.
Dataset Splits No The paper describes experimental setup and evaluation procedures but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files).
Hardware Specification Yes We train our models on an NVIDIA RTX 2080TI GPU using experience sampled from 32 parallel environments.
Software Dependencies No The paper mentions using 'Open AI implementation of PPO2' and 'Adam optimizer' but does not specify version numbers for these or any other software libraries or dependencies, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes For all our methods and baselines, we use η/ p N(s) as the exploration bonus for N(s)-th visit to state s. Specific values of η and scaling weights can be found in Table 2. ... We use Adam optimizer (Kingma & Ba, 2014) with learning rate 1 10-3 and batchsize 2048. ... The horizon of one episode is set to 300 timesteps in all these tasks.