Adversarial Cheap Talk
Authors: Chris Lu, Timon Willi, Alistair Letcher, Jakob Nicolaus Foerster
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ACT on three simple gym environments: Cartpole, Pendulum, and Reacher (Brockman et al., 2016). We also evaluate ACT on Minatar Breakout (Young and Tian, 2019; Lange, 2022b) to test ACT s ability to scale to higherdimensional environments. The Victim is trained with Proximal Policy Optimisation (Schulman et al., 2017, PPO). The Adversary is trained using ES (Salimans et al., 2017). |
| Researcher Affiliation | Academia | 1FLAIR, University of Oxford 2aletcher.github.io. Correspondence to: Chris Lu <christopher.lu@eng.ox.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 Train-time ACT", "Algorithm 2 Test-time ACT", "Algorithm 3 Test-time Oracle PPO ACT", "Algorithm 4 Test-time Random Shaper |
| Open Source Code | Yes | Project video and code are available at https://sites.google.com/ view/adversarial-cheap-talk. |
| Open Datasets | Yes | We evaluate ACT on three simple gym environments: Cartpole, Pendulum, and Reacher (Brockman et al., 2016). We also evaluate ACT on Minatar Breakout (Young and Tian, 2019; Lange, 2022b)... |
| Dataset Splits | No | The paper does not explicitly state training/validation/test dataset splits. While it mentions training and testing, it does not specify a separate validation split or its size/methodology. |
| Hardware Specification | Yes | We train thousands of agents per minute on a single V100 GPU by vectorising both the PPO algorithm itself and the environments using Jax (Bradbury et al., 2018). ... For example, in Cartpole, we can simultaneously train 8192 PPO agents at a time on a single V100 GPU. Over 1024 generations of ES, this results in training 8,388,608 PPO agents from scratch in 2 hours on 4 V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like Jax, PPO, gym environments, Minatar, and refers to specific implementations by Lange (evosax, gymnax). However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Appendix E. Hyperparameter Details, Table 1-4 provide specific parameter values for each environment, including 'Learning Rate', 'Population Size', 'Number of Generations', 'Outer Agent (OA) Hidden Layers', 'OA Size of Hidden Layers', 'OA Hidden Activation Function', 'Inner Agent (IA) Actor Hidden Layers', etc. |