reproducibilityindex.ai

IQ-Learn: Inverse soft-Q Learning for Imitation

Authors: Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare IQ-Learn ("IQ") to prior work on a diverse collection of RL tasks and environments ranging from low-dimensional control tasks: Cart Pole, Acrobot, Lunar Lander to more challenging continuous control Mu Jo Co tasks: Half Cheetah, Hopper, Walker and Ant. Furthermore, we test on the visually challenging Atari Suite with high-dimensional image inputs. We compare on ofﬂine IL with no access to the the environment while training, and online IL with environment access.
Researcher Affiliation	Academia	Divyansh Garg Shuvam Chakraborty Chris Cundy Jiaming Song Stefano Ermon Stanford University {divgarg, shuvamc, cundy, tsong, ermon}@stanford.edu
Pseudocode	Yes	Pseudocode in Algorithm 1, shows our Q-learning and actor-critic variants, with differences with conventional RL algorithms in red (we optimize -J to use gradient descent). We can implement our algorithm IQ-Learn in 15 lines of code on top of standard implementations of (soft) DQN [14] for discrete control or soft actor-critic (SAC) [13] for continuous control, with a change on the objective for the Q-function.
Open Source Code	No	The paper mentions 'Default hyperparameters from [14, 13] work well', referring to existing works for implementation details, and cites 'Antonin Rafﬁn. Rl baselines3 zoo. https://github.com/DLR-RM/rl-baselines3-zoo' as a reference, which is a third-party repository. There is no explicit statement about IQ-Learn's own source code being released or a direct link to it.
Open Datasets	Yes	We compare IQ-Learn ("IQ") to prior work on a diverse collection of RL tasks and environments ranging from low-dimensional control tasks: Cart Pole, Acrobot, Lunar Lander to more challenging continuous control Mu Jo Co tasks: Half Cheetah, Hopper, Walker and Ant. Furthermore, we test on the visually challenging Atari Suite with high-dimensional image inputs.
Dataset Splits	No	The paper mentions 'tuning the entropy regularization' but does not explicitly provide details on train/validation/test dataset splits, specific percentages, or how data was partitioned for validation purposes.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for its experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software like 'soft) DQN [14]', 'soft actor-critic (SAC) [13]', and 'rl-baselines3-zoo [32]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Default hyperparameters from [14, 13] work well, except for tuning the entropy regularization. Target networks were helpful for continuous control. We elaborate details in Appendix D. Hyperparameter settings and training details are detailed in Appendix D.