reproducibilityindex.ai

Trajectory Diversity for Zero-Shot Coordination

Authors: Andrei Lupu, Brandon Cui, Hengyuan Hu, Jakob Foerster

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Traje Di experimentally. Thanks to two MDPs and a matrix game, we provide empirical insights into the shortcomings of standard approaches and show the suitability of Traje Di in discovering multiple optimal solutions. Afterwards, we proceed to demonstrate that Traje Di scales well to arbitrarily complex settings by using it to improve ZSC scores in the collaborative partially observable card game Hanabi.
Researcher Affiliation	Collaboration	1Mila, Mc Gill University (Work done while at Facebook AI Research) 2Facebook AI Research.
Pseudocode	Yes	We present the full Traje Di PDT procedure for ZSC in algorithm 1. Algorithm 1 Traje Di PBT with Common Best Response
Open Source Code	Yes	The code is available online and can be run in-browser: https: //bit.ly/33NBw5o
Open Datasets	Yes	Finally, we apply algorithm 1 to improve ZSC in Hanabi. We chose this game because it was recently proposed as a challenge in artiﬁcial intelligence (Bard et al., 2020), and because it was studied by Hu et al. in the context of ZSC.
Dataset Splits	No	The paper discusses training and test (cross-play) performance but does not explicitly mention or detail a separate validation dataset split.
Hardware Specification	No	The paper mentions that training is 'very compute intensive' and 'it uses 2 GPUs per agent', but it does not specify the model or type of GPUs or any other hardware components (e.g., CPU, RAM).
Software Dependencies	No	The paper does not explicitly provide specific version numbers for any software dependencies or libraries used in their experiments.
Experiment Setup	Yes	We implement simple policy-gradient policies and train 10 populations of n agents... we use γ = 1. ...we put a high weight on the Traje Di loss term (α = 4 in eq. 7)... we train four independent pools of Traje Di-regularized policies of size 3... all our policies are trained with OP augmented with an auxiliary task... we prevent agents from seeing the last action used by the partner.