SVQN: Sequential Variational Soft Q-Learning Networks

Authors: Shiyu Huang, Hang Su, Jun Zhu, Ting Chen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that SVQNs can utilize past information to help decision making for efficient inference, and outperforms other baselines on several challenging tasks. Our ablation study shows that SVQNs have the generalization ability over time and are robust to the disturbance of the observation.
Researcher Affiliation Collaboration Shiyu Huang, Hang Su, Jun Zhu , Ting Chen Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, THBI Lab, Tsinghua University hsy17@mails.tsinghua.edu.cn; {suhangss, dcszj, tingchen}@tsinghua.edu.cn J. Zhu and T. Chen are corresponding authors. J. Zhu is also with Real AI.
Pseudocode No The paper describes the algorithm and its components in text and with diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific repository link or an explicit statement about releasing source code for the described methodology.
Open Datasets Yes We evaluate our algorithm on flickering Atari (Hausknecht & Stone, 2015) and Vi ZDoom platform (Kempka et al., 2016). Atari environments (Bellemare et al., 2013) are widely used as the benchmark for deep reinforcement learning algorithms due to its high dimensional observation spaces and numerous challenging tasks.
Dataset Splits No The paper mentions training steps and evaluation episodes but does not provide specific details on train/validation/test dataset splits (e.g., percentages, sample counts, or explicit mention of a validation set).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions the framework is "implemented by Python and Tensorflow" but does not specify version numbers for these software components or any other libraries.
Experiment Setup Yes For the recurrent neural networks, we use a sequence length of 5 for training. All the algorithms train for 10000,000 steps and run for 100 episodes during evaluation. All the algorithms train for 300,000 steps and run for 20 episodes during evaluation. The discount factor γ is set to 0.95, learning rate is 0.0001 and the Adam optimizer (Kingma & Ba, 2014) is used for training. The network architecture for these tasks and training details are shown in Appendix D.