Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs

Authors: Ezgi Korkmaz7229-7238

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs. ... Our main contributions are as follows: ... Via experiments in the Arcade Learning Environment we rigorously show that the high-sensitivity directions computed in our framework correlate strongly across states and in several cases across MDPs.
Researcher Affiliation Academia The paper lists the author as Ezgi Korkmaz. No institutional affiliation or email domain is provided within the paper's text to determine the author's affiliation type.
Pseudocode Yes Algorithm 1: High-sensitivity directions with Arandom alg
Open Source Code No The paper does not provide any specific links or explicit statements about the release of source code for the methodology described.
Open Datasets Yes The Arcade Learning Environment (ALE) is used as a standard baseline... In our experiments agents are trained with Double Deep Q-Network (DDQN) proposed by Wang et al. (2016) with prioritized experience replay Schaul et al. (2016) in the ALE introduced by Bellemare et al. (2013) with the Open AI baselines version Brockman et al. (2016).
Dataset Splits No The paper does not explicitly provide information on training/validation/test dataset splits, which is common for reinforcement learning environments where data is generated through interaction rather than being a static dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software components like 'Double Deep Q-Network (DDQN)', 'prioritized experience replay', and 'Open AI baselines version', but it does not provide specific version numbers for these components.
Experiment Setup No The paper mentions the algorithms used for training agents (DDQN, SA-DDQN) and sets an ℓ2-norm bound κ for their framework, but it does not specify detailed experimental setup parameters such as learning rates, batch sizes, optimizers, or other hyperparameters for agent training.