reproducibilityindex.ai

SVQN: Sequential Variational Soft Q-Learning Networks

Authors: Shiyu Huang, Hang Su, Jun Zhu, Ting Chen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that SVQNs can utilize past information to help decision making for efﬁcient inference, and outperforms other baselines on several challenging tasks. Our ablation study shows that SVQNs have the generalization ability over time and are robust to the disturbance of the observation.
Researcher Affiliation	Collaboration	Shiyu Huang, Hang Su, Jun Zhu , Ting Chen Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, THBI Lab, Tsinghua University hsy17@mails.tsinghua.edu.cn; {suhangss, dcszj, tingchen}@tsinghua.edu.cn J. Zhu and T. Chen are corresponding authors. J. Zhu is also with Real AI.
Pseudocode	No	The paper describes the algorithm and its components in text and with diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific repository link or an explicit statement about releasing source code for the described methodology.
Open Datasets	Yes	We evaluate our algorithm on ﬂickering Atari (Hausknecht & Stone, 2015) and Vi ZDoom platform (Kempka et al., 2016). Atari environments (Bellemare et al., 2013) are widely used as the benchmark for deep reinforcement learning algorithms due to its high dimensional observation spaces and numerous challenging tasks.
Dataset Splits	No	The paper mentions training steps and evaluation episodes but does not provide specific details on train/validation/test dataset splits (e.g., percentages, sample counts, or explicit mention of a validation set).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions the framework is "implemented by Python and Tensorﬂow" but does not specify version numbers for these software components or any other libraries.
Experiment Setup	Yes	For the recurrent neural networks, we use a sequence length of 5 for training. All the algorithms train for 10000,000 steps and run for 100 episodes during evaluation. All the algorithms train for 300,000 steps and run for 20 episodes during evaluation. The discount factor γ is set to 0.95, learning rate is 0.0001 and the Adam optimizer (Kingma & Ba, 2014) is used for training. The network architecture for these tasks and training details are shown in Appendix D.