Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Authors: Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically-timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. |
| Researcher Affiliation | Collaboration | Yen-Chen Lin1, Zhang-Wei Hong1, Yuan-Hong Liao1, Meng-Li Shih1, Min Sun1 1National Tsing Hua University, Taiwan 2NVIDIA, Santa Clara, California, USA |
| Pseudocode | No | No explicit pseudocode or algorithm block is present, although mathematical formulations for optimization problems and functions are provided. |
| Open Source Code | No | The paper states 'Our implementation will be released.' but does not provide a concrete link or access at the time of publication. |
| Open Datasets | Yes | We evaluated our tactics of adversarial attack to deep RL agents on 5 different Atari 2600 games (i.e., Ms Pacman, Pong, Seaquest, Qbert, and Chopper Command) using Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes training and evaluation on Atari games, but does not explicitly provide numerical training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | No specific hardware details (GPU, CPU models, memory, etc.) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions software components like Open AI Gym, A3C, and DQN algorithms but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The input to the neural network at time t was the concatenation of the last 4 images. Each of the images was resized to 84 84. The pixel value was rescaled to [0, 1]... We early stopped the optimizer when D(s, s + δ) < ϵ, where ϵ is a small value set to 0.007. The value of temperature T in Equation (4) is set to 1 in the experiments. |