Self-Imitation Learning via Generalized Lower Bound Q-learning

Authors: Yunhao Tang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments We seek to address the following questions in the experiments: (1) Does generalized SIL entail performance gains on both deterministic and stochastic actor-critic algorithms? (2) How do the design choices (e.g. hyper-parameters, prioritized replay) of generalized SIL impact its performance?
Researcher Affiliation Academia Yunhao Tang Columbia University yt2541@columbia.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The implementation is available at https://github.com/robintyh1/nstep-sil.
Open Datasets Yes For benchmark tasks, we focus on state-based continuous control. In order to assess the strengths of different algorithmic variants, we consider similar tasks Walker, Cheetah and Ant with different simulation backends from Open AI gym [31], Deep Mind Control Suite [32] and Bullet Physics Engine [33].
Dataset Splits No The paper describes training and evaluation, but does not explicitly state the train/validation/test dataset split percentages or methods used for its experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions several software components and environments like Open AI gym, Deep Mind Control Suite, Bullet Physics Engine, TD3, PPO, and Adam, but does not specify their version numbers.
Experiment Setup Yes Importantly, note that the weighting coefficient is fixed η = 0.1 for all cases of generalized SIL. For general SIL, we adopt α = 0.6, β = 0.1 as in [8]. The final performance of algorithms after training (5 * 10^6 steps for Half Cheetah and 10^7 for the others) are shown in Figure 3.