reproducibilityindex.ai

Self-Imitation Learning

Authors: Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that SIL signiﬁcantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on Mu Jo Co tasks.
Researcher Affiliation	Collaboration	1University of Michigan 2Google Brain. Correspondence to: Junhyuk Oh <junhyuk@umich.edu>, Yijie Guo <guoyijie@umich.com>.
Pseudocode	Yes	Algorithm 1 Actor-Critic with Self-Imitation Learning
Open Source Code	Yes	The code is available on https://github.com/ junhyukoh/self-imitation-learning.
Open Datasets	Yes	For Atari experiments, we used a 3-layer convolutional neural network used in DQN (Mnih et al., 2015) with last 4 stacked frames as input. ... on several hard exploration Atari games (Bellemare et al., 2013). ... Finally, SIL improves the performance of proximal policy optimization (PPO) on Mu Jo Co continuous control tasks (Brockman et al., 2016; Todorov et al., 2012).
Dataset Splits	No	The paper does not explicitly provide specific percentages, sample counts, or citations to predefined train/validation/test splits for the datasets used.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions that their implementation is "based on Open AI s baseline implementation (Dhariwal et al., 2017)" but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For Atari experiments, we used a 3-layer convolutional neural network used in DQN (Mnih et al., 2015) with last 4 stacked frames as input. We performed 4 self-imitation learning updates per on-policy actor-critic update (M = 4 in Algorithm 1). ... For Mu Jo Co experiments, we used an MLP which consists of 2 hidden layers with 64 units as in Schulman et al. (2017b). We performed 10 self-imitation learning updates per each iteration (batch).