Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs
Authors: Ezgi Korkmaz7229-7238
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs. ... Our main contributions are as follows: ... Via experiments in the Arcade Learning Environment we rigorously show that the high-sensitivity directions computed in our framework correlate strongly across states and in several cases across MDPs. |
| Researcher Affiliation | Academia | The paper lists the author as Ezgi Korkmaz. No institutional affiliation or email domain is provided within the paper's text to determine the author's affiliation type. |
| Pseudocode | Yes | Algorithm 1: High-sensitivity directions with Arandom alg |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the release of source code for the methodology described. |
| Open Datasets | Yes | The Arcade Learning Environment (ALE) is used as a standard baseline... In our experiments agents are trained with Double Deep Q-Network (DDQN) proposed by Wang et al. (2016) with prioritized experience replay Schaul et al. (2016) in the ALE introduced by Bellemare et al. (2013) with the Open AI baselines version Brockman et al. (2016). |
| Dataset Splits | No | The paper does not explicitly provide information on training/validation/test dataset splits, which is common for reinforcement learning environments where data is generated through interaction rather than being a static dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Double Deep Q-Network (DDQN)', 'prioritized experience replay', and 'Open AI baselines version', but it does not provide specific version numbers for these components. |
| Experiment Setup | No | The paper mentions the algorithms used for training agents (DDQN, SA-DDQN) and sets an ℓ2-norm bound κ for their framework, but it does not specify detailed experimental setup parameters such as learning rates, batch sizes, optimizers, or other hyperparameters for agent training. |