reproducibilityindex.ai

Adversarial Cheap Talk

Authors: Chris Lu, Timon Willi, Alistair Letcher, Jakob Nicolaus Foerster

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ACT on three simple gym environments: Cartpole, Pendulum, and Reacher (Brockman et al., 2016). We also evaluate ACT on Minatar Breakout (Young and Tian, 2019; Lange, 2022b) to test ACT s ability to scale to higherdimensional environments. The Victim is trained with Proximal Policy Optimisation (Schulman et al., 2017, PPO). The Adversary is trained using ES (Salimans et al., 2017).
Researcher Affiliation	Academia	1FLAIR, University of Oxford 2aletcher.github.io. Correspondence to: Chris Lu <christopher.lu@eng.ox.ac.uk>.
Pseudocode	Yes	Algorithm 1 Train-time ACT", "Algorithm 2 Test-time ACT", "Algorithm 3 Test-time Oracle PPO ACT", "Algorithm 4 Test-time Random Shaper
Open Source Code	Yes	Project video and code are available at https://sites.google.com/ view/adversarial-cheap-talk.
Open Datasets	Yes	We evaluate ACT on three simple gym environments: Cartpole, Pendulum, and Reacher (Brockman et al., 2016). We also evaluate ACT on Minatar Breakout (Young and Tian, 2019; Lange, 2022b)...
Dataset Splits	No	The paper does not explicitly state training/validation/test dataset splits. While it mentions training and testing, it does not specify a separate validation split or its size/methodology.
Hardware Specification	Yes	We train thousands of agents per minute on a single V100 GPU by vectorising both the PPO algorithm itself and the environments using Jax (Bradbury et al., 2018). ... For example, in Cartpole, we can simultaneously train 8192 PPO agents at a time on a single V100 GPU. Over 1024 generations of ES, this results in training 8,388,608 PPO agents from scratch in 2 hours on 4 V100 GPUs.
Software Dependencies	No	The paper mentions software components like Jax, PPO, gym environments, Minatar, and refers to specific implementations by Lange (evosax, gymnax). However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Appendix E. Hyperparameter Details, Table 1-4 provide specific parameter values for each environment, including 'Learning Rate', 'Population Size', 'Number of Generations', 'Outer Agent (OA) Hidden Layers', 'OA Size of Hidden Layers', 'OA Hidden Activation Function', 'Inner Agent (IA) Actor Hidden Layers', etc.