reproducibilityindex.ai

Self-Imitation Learning via Generalized Lower Bound Q-learning

Authors: Yunhao Tang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments We seek to address the following questions in the experiments: (1) Does generalized SIL entail performance gains on both deterministic and stochastic actor-critic algorithms? (2) How do the design choices (e.g. hyper-parameters, prioritized replay) of generalized SIL impact its performance?
Researcher Affiliation	Academia	Yunhao Tang Columbia University yt2541@columbia.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation is available at https://github.com/robintyh1/nstep-sil.
Open Datasets	Yes	For benchmark tasks, we focus on state-based continuous control. In order to assess the strengths of different algorithmic variants, we consider similar tasks Walker, Cheetah and Ant with different simulation backends from Open AI gym [31], Deep Mind Control Suite [32] and Bullet Physics Engine [33].
Dataset Splits	No	The paper describes training and evaluation, but does not explicitly state the train/validation/test dataset split percentages or methods used for its experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions several software components and environments like Open AI gym, Deep Mind Control Suite, Bullet Physics Engine, TD3, PPO, and Adam, but does not specify their version numbers.
Experiment Setup	Yes	Importantly, note that the weighting coefﬁcient is ﬁxed η = 0.1 for all cases of generalized SIL. For general SIL, we adopt α = 0.6, β = 0.1 as in [8]. The final performance of algorithms after training (5 * 10^6 steps for Half Cheetah and 10^7 for the others) are shown in Figure 3.