reproducibilityindex.ai

Training a Resilient Q-network against Observational Interference

Authors: Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen8814-8822

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of CIQ in several benchmark DQN environments with different types of interferences as auxiliary labels. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.
Researcher Affiliation	Collaboration	Chao-Han Huck Yang1, I-Te Danny Hung2, Yi Ouyang3, Pin-Yu Chen4 Georgia Institute of Technology1, Columbia University2, Preferred Networks America3, IBM Research4
Pseudocode	Yes	The CIQ training procedure (Algorithm 1) and an advanced CIQ based on variational inference (Louizos et al. 2017) are described in Appendix B.
Open Source Code	Yes	Our demo code is available at github.com/huckiyang/Obs-Causal-Q-Network.
Open Datasets	Yes	Our testing platforms were based on (a) Open AI Gym (Brockman et al. 2016), (b) Unity-3D environments (Juliani et al. 2018), (c) a 2D gaming environment (Brockman et al. 2016), and (d) visual learning from pixel inputs of cart pole. and Banana Collector: The Banana collector (a) is one of the Unity 3D baseline (Juliani et al. 2018) (shown in Fig. 5 (a).) and Lunar Lander: Similar to the Atari gaming environments, Lunar Lander-v2 (Fig. 5 (b)) is a discrete action environment from Open AI Gym (Brockman et al. 2016) to control firing ejector with a targeted reward of 200.
Dataset Splits	No	The paper does not explicitly specify train/validation/test dataset splits or cross-validation setup, only stating We train each agent 50 times and highlight its standard deviation with lighter colors. Each agent is trained until the target score (shown as the dashed black line) is reached or until 400 episodes.
Hardware Specification	No	The paper does not specify any particular hardware components (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software platforms like Open AI Gym and Unity-3D environments but does not provide specific version numbers for these or any other software dependencies, libraries, or frameworks.
Experiment Setup	Yes	We train each agent 50 times and highlight its standard deviation with lighter colors. Each agent is trained until the target score (shown as the dashed black line) is reached or until 400 episodes. and We ensure all the models have the same number of 9.7 millions parameters with careful fine-tuning to avoid model capacity issues. and where λ is a scaling constant and is set to 1 for simplicity.