Training a Resilient Q-network against Observational Interference
Authors: Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen8814-8822
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of CIQ in several benchmark DQN environments with different types of interferences as auxiliary labels. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences. |
| Researcher Affiliation | Collaboration | Chao-Han Huck Yang1, I-Te Danny Hung2, Yi Ouyang3, Pin-Yu Chen4 Georgia Institute of Technology1, Columbia University2, Preferred Networks America3, IBM Research4 |
| Pseudocode | Yes | The CIQ training procedure (Algorithm 1) and an advanced CIQ based on variational inference (Louizos et al. 2017) are described in Appendix B. |
| Open Source Code | Yes | Our demo code is available at github.com/huckiyang/Obs-Causal-Q-Network. |
| Open Datasets | Yes | Our testing platforms were based on (a) Open AI Gym (Brockman et al. 2016), (b) Unity-3D environments (Juliani et al. 2018), (c) a 2D gaming environment (Brockman et al. 2016), and (d) visual learning from pixel inputs of cart pole. and Banana Collector: The Banana collector (a) is one of the Unity 3D baseline (Juliani et al. 2018) (shown in Fig. 5 (a).) and Lunar Lander: Similar to the Atari gaming environments, Lunar Lander-v2 (Fig. 5 (b)) is a discrete action environment from Open AI Gym (Brockman et al. 2016) to control firing ejector with a targeted reward of 200. |
| Dataset Splits | No | The paper does not explicitly specify train/validation/test dataset splits or cross-validation setup, only stating We train each agent 50 times and highlight its standard deviation with lighter colors. Each agent is trained until the target score (shown as the dashed black line) is reached or until 400 episodes. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software platforms like Open AI Gym and Unity-3D environments but does not provide specific version numbers for these or any other software dependencies, libraries, or frameworks. |
| Experiment Setup | Yes | We train each agent 50 times and highlight its standard deviation with lighter colors. Each agent is trained until the target score (shown as the dashed black line) is reached or until 400 episodes. and We ensure all the models have the same number of 9.7 millions parameters with careful fine-tuning to avoid model capacity issues. and where λ is a scaling constant and is set to 1 for simplicity. |