reproducibilityindex.ai

Truthful Self-Play

Authors: Shohei Ohsawa

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments with predator prey, trafﬁc junction and Star Craft tasks demonstrate that the state-of-the-art performance of our framework. As the second contribution, based on the results of numerical experiments, we report that the TSP achieved state-of-the-art performance for various multi-agent tasks made of up to 20 agents (Section 5).
Researcher Affiliation	Industry	Shohei Ohsawa Founder & CEO Daisy AI 6-13-9 Ginza, Chuo-ku, Tokyo, Japan o@daisy.inc
Pseudocode	Yes	We show the whole procedure in Algorithm 1. Algorithm 1 The truthful self-play (TSP).
Open Source Code	No	The paper does not provide a specific link or explicit statement about releasing the source code for the described methodology.
Open Datasets	Yes	Using predator prey (Barrett et al., 2011), trafﬁc junction (Sukhbaatar et al., 2016; Singh et al., 2019), and Star Craft (Synnaeve et al., 2016) environments, which are typically used in Comm-POSG research, we compared the performances of TSP with the current neural nets.
Dataset Splits	No	The paper mentions the use of specific environments (predator prey, traffic junction, Star Craft) but does not provide explicit details on dataset splits (e.g., percentages or sample counts for training, validation, or testing sets).
Hardware Specification	No	We performed 2,000 epochs of experiment with 500 steps, each using 120 CPUs. (No specific CPU model or other hardware details are provided).
Software Dependencies	No	The paper mentions "deep learning software libraries such as Tensor Flow and Py Torch" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Table 5: Hyperparameters used in the experiment. β are grid searched in space {0.1, 1, 10, 100}, and the best parameter is shown. The other parameters are not adjusted. Notation Value Agents n {3, 5, 10, 20} Observation xti X R9 Internal state hti H R64 Message zti Z R64 Actions ati A { , , , , astop} True state st S {0, 1}25 400 Episode length T 20 Learning rate α 0.001 Truthful rate β 10 Discount rate γ 1.0 Metrics ψ H [ ]