reproducibilityindex.ai

Influence-Based Multi-Agent Exploration

Authors: Tonghan Wang*, Jianhao Wang*, Yi Wu, Chongjie Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically demonstrate the significant strength of our methods in a variety of multi-agent scenarios.
Researcher Affiliation	Collaboration	Tonghan Wang , Jianhao Wang , Yi Wu & Chongjie Zhang Institute for Interdisciplinary Information Sciences Tsinghua University Beijing, China wangth18@mails.tsinghua.edu.cn, wjh720.eric@gmail.com jxwuyi@openai.com, chongjie@tsinghua.edu.cn
Pseudocode	No	The paper presents mathematical derivations and equations, but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper states 'The video of experiments is available at https://sites. google.com/view/influence-based-mae/' and refers to 'Open AI implementation of PPO2' for its framework, but it does not provide an explicit link or statement about releasing its own source code for the described methodology.
Open Datasets	No	The paper describes custom multi-agent tasks (Pass, Secret-Room, Push-Box, Island, Large-Island) within a 'discrete version of multi-agent particle world environment' and details their setup in Appendix D, but it does not provide explicit access information (link, DOI, formal citation) to these environments or datasets for public access.
Dataset Splits	No	The paper describes experimental setup and evaluation procedures but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files).
Hardware Specification	Yes	We train our models on an NVIDIA RTX 2080TI GPU using experience sampled from 32 parallel environments.
Software Dependencies	No	The paper mentions using 'Open AI implementation of PPO2' and 'Adam optimizer' but does not specify version numbers for these or any other software libraries or dependencies, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup	Yes	For all our methods and baselines, we use η/ p N(s) as the exploration bonus for N(s)-th visit to state s. Specific values of η and scaling weights can be found in Table 2. ... We use Adam optimizer (Kingma & Ba, 2014) with learning rate 1 10-3 and batchsize 2048. ... The horizon of one episode is set to 300 timesteps in all these tasks.