Self-Imitation Learning via Generalized Lower Bound Q-learning
Authors: Yunhao Tang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We seek to address the following questions in the experiments: (1) Does generalized SIL entail performance gains on both deterministic and stochastic actor-critic algorithms? (2) How do the design choices (e.g. hyper-parameters, prioritized replay) of generalized SIL impact its performance? |
| Researcher Affiliation | Academia | Yunhao Tang Columbia University yt2541@columbia.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation is available at https://github.com/robintyh1/nstep-sil. |
| Open Datasets | Yes | For benchmark tasks, we focus on state-based continuous control. In order to assess the strengths of different algorithmic variants, we consider similar tasks Walker, Cheetah and Ant with different simulation backends from Open AI gym [31], Deep Mind Control Suite [32] and Bullet Physics Engine [33]. |
| Dataset Splits | No | The paper describes training and evaluation, but does not explicitly state the train/validation/test dataset split percentages or methods used for its experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions several software components and environments like Open AI gym, Deep Mind Control Suite, Bullet Physics Engine, TD3, PPO, and Adam, but does not specify their version numbers. |
| Experiment Setup | Yes | Importantly, note that the weighting coefficient is fixed η = 0.1 for all cases of generalized SIL. For general SIL, we adopt α = 0.6, β = 0.1 as in [8]. The final performance of algorithms after training (5 * 10^6 steps for Half Cheetah and 10^7 for the others) are shown in Figure 3. |