reproducibilityindex.ai

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Authors: Siddharth Reddy, Anca D. Dragan, Sergey Levine

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that SQIL outperforms BC and achieves competitive results compared to GAIL, on a variety of image-based and low-dimensional tasks in Box2D, Atari, and Mu Jo Co. This paper is a proof of concept that illustrates how a simple imitation method based on RL with constant rewards can be as effective as more complex methods that use learned rewards.
Researcher Affiliation	Academia	Siddharth Reddy, Anca D. Dragan, Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley {sgr,anca,svlevine}@berkeley.edu
Pseudocode	Yes	Algorithm 1 Soft Q Imitation Learning (SQIL)
Open Source Code	No	The paper mentions using and adapting existing open-source implementations (e.g., OpenAI Baselines) and pretrained policies, but it does not state that the code for SQIL or their specific modifications is released or publicly available.
Open Datasets	Yes	We run experiments in four image-based environments Car Racing, Pong, Breakout, and Space Invaders and three low-dimensional environments Humanoid, Half Cheetah, and Lunar Lander (Brockman et al., 2016; Bellemare et al., 2013; Todorov et al., 2012).
Dataset Splits	No	The paper does not specify explicit train/validation/test dataset splits (e.g., percentages or exact counts) for reproducibility, nor does it reference standard splits with specific details for the environments used beyond general mentions of expert demonstrations and collected experiences.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU models, CPU types, or memory sizes used for running the experiments.
Software Dependencies	No	The paper mentions the use of algorithms and frameworks like Adam, Deep Q-learning, and Soft Actor-Critic, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	For Lunar Lander, we set λsamp = 10 6. For Car Racing, we set λsamp = 0.01. For all other environments, we set λsamp = 1. For Lunar Lander, we used a network architecture with two fully-connected layers containing 128 hidden units each to represent the Q network in SQIL, the policy and discriminator networks in GAIL, and the policy network in BC.