Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
Authors: Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments are based on a 3D Grid World domain and show: a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behavior; b) efficacy, i.e., the attack is size-agnostic and transferable. |
| Researcher Affiliation | Academia | Ridhima Bector , Hang Xu , Abhay Aradhya , Chai Quek and Zinovi Rabinovich Nanyang Technological University {ridhima001, hang017}@e.ntu.edu.sg, {abhayaradhya, ashcquek, zinovi}@ntu.edu.sg |
| Pseudocode | No | The paper describes its methods textually and with diagrams (Figure 1, Figure 2) but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code and Appendices are available at bit.ly/github-rb-cep . |
| Open Datasets | Yes | This work tests and establishes the quality of the proposed methodology by training an attacker to learn to attack a population of navigational agents in a stochastic grid environment titled 3D Grid World [Rabinovich et al., 2010]. |
| Dataset Splits | Yes | In this experiment, attack strategies are trained and tested on populations of same size. Each strategy is tested on 20 populations. In Implicit Collective scenario with Q-learning victim agents, 10 test populations use the same seed as the one used by the victim populations during training, while each agent in each of the remaining 10 populations uses a different seed. In Swarm and True Collective scenarios with DQN victim agents, neural networks corresponding to 10 test populations are initialized using random numbers from the same range as used during training, while the remaining 10 test populations are initialized using a different range. |
| Hardware Specification | No | The paper describes the algorithms used (e.g., Q-learning, DQN) and the experimental setup, but it does not specify any hardware components like CPU models, GPU models, or memory sizes used for running the experiments. |
| Software Dependencies | No | The paper mentions learning algorithms like Q-learning and DQN, and concepts like variational autoencoders (VAE) and Wasserstein distance, but it does not specify any software packages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The attacker training episodes are 15-step sequential attacks on freshly initialized victim populations wherein attack step 0 corresponds to the original environment with default dynamics. After each episode, the attack strategy employed in that episode is saved if it is better or equal to the best attack strategy found so far, with respect to last-timestep, mean or cumulative value of at least one strategy quality criterion. Experiment H1 Concatenation vs Barycenter... In this experiment, attack strategies are trained and tested on populations of same size. Each strategy is tested on 20 populations. In Implicit Collective scenario with Q-learning victim agents, 10 test populations use the same seed as the one used by the victim populations during training, while each agent in each of the remaining 10 populations uses a different seed. In Swarm and True Collective scenarios with DQN victim agents, neural networks corresponding to 10 test populations are initialized using random numbers from the same range as used during training, while the remaining 10 test populations are initialized using a different range. |