Adversarial Cheap Talk

Authors: Chris Lu, Timon Willi, Alistair Letcher, Jakob Nicolaus Foerster

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate ACT on three simple gym environments: Cartpole, Pendulum, and Reacher (Brockman et al., 2016). We also evaluate ACT on Minatar Breakout (Young and Tian, 2019; Lange, 2022b) to test ACT s ability to scale to higherdimensional environments. The Victim is trained with Proximal Policy Optimisation (Schulman et al., 2017, PPO). The Adversary is trained using ES (Salimans et al., 2017).
Researcher Affiliation Academia 1FLAIR, University of Oxford 2aletcher.github.io. Correspondence to: Chris Lu <christopher.lu@eng.ox.ac.uk>.
Pseudocode Yes Algorithm 1 Train-time ACT", "Algorithm 2 Test-time ACT", "Algorithm 3 Test-time Oracle PPO ACT", "Algorithm 4 Test-time Random Shaper
Open Source Code Yes Project video and code are available at https://sites.google.com/ view/adversarial-cheap-talk.
Open Datasets Yes We evaluate ACT on three simple gym environments: Cartpole, Pendulum, and Reacher (Brockman et al., 2016). We also evaluate ACT on Minatar Breakout (Young and Tian, 2019; Lange, 2022b)...
Dataset Splits No The paper does not explicitly state training/validation/test dataset splits. While it mentions training and testing, it does not specify a separate validation split or its size/methodology.
Hardware Specification Yes We train thousands of agents per minute on a single V100 GPU by vectorising both the PPO algorithm itself and the environments using Jax (Bradbury et al., 2018). ... For example, in Cartpole, we can simultaneously train 8192 PPO agents at a time on a single V100 GPU. Over 1024 generations of ES, this results in training 8,388,608 PPO agents from scratch in 2 hours on 4 V100 GPUs.
Software Dependencies No The paper mentions software components like Jax, PPO, gym environments, Minatar, and refers to specific implementations by Lange (evosax, gymnax). However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes Appendix E. Hyperparameter Details, Table 1-4 provide specific parameter values for each environment, including 'Learning Rate', 'Population Size', 'Number of Generations', 'Outer Agent (OA) Hidden Layers', 'OA Size of Hidden Layers', 'OA Hidden Activation Function', 'Inner Agent (IA) Actor Hidden Layers', etc.