IQ-Learn: Inverse soft-Q Learning for Imitation

Authors: Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare IQ-Learn ("IQ") to prior work on a diverse collection of RL tasks and environments ranging from low-dimensional control tasks: Cart Pole, Acrobot, Lunar Lander to more challenging continuous control Mu Jo Co tasks: Half Cheetah, Hopper, Walker and Ant. Furthermore, we test on the visually challenging Atari Suite with high-dimensional image inputs. We compare on offline IL with no access to the the environment while training, and online IL with environment access.
Researcher Affiliation Academia Divyansh Garg Shuvam Chakraborty Chris Cundy Jiaming Song Stefano Ermon Stanford University {divgarg, shuvamc, cundy, tsong, ermon}@stanford.edu
Pseudocode Yes Pseudocode in Algorithm 1, shows our Q-learning and actor-critic variants, with differences with conventional RL algorithms in red (we optimize -J to use gradient descent). We can implement our algorithm IQ-Learn in 15 lines of code on top of standard implementations of (soft) DQN [14] for discrete control or soft actor-critic (SAC) [13] for continuous control, with a change on the objective for the Q-function.
Open Source Code No The paper mentions 'Default hyperparameters from [14, 13] work well', referring to existing works for implementation details, and cites 'Antonin Raffin. Rl baselines3 zoo. https://github.com/DLR-RM/rl-baselines3-zoo' as a reference, which is a third-party repository. There is no explicit statement about IQ-Learn's own source code being released or a direct link to it.
Open Datasets Yes We compare IQ-Learn ("IQ") to prior work on a diverse collection of RL tasks and environments ranging from low-dimensional control tasks: Cart Pole, Acrobot, Lunar Lander to more challenging continuous control Mu Jo Co tasks: Half Cheetah, Hopper, Walker and Ant. Furthermore, we test on the visually challenging Atari Suite with high-dimensional image inputs.
Dataset Splits No The paper mentions 'tuning the entropy regularization' but does not explicitly provide details on train/validation/test dataset splits, specific percentages, or how data was partitioned for validation purposes.
Hardware Specification No The paper does not provide specific details regarding the hardware used for its experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software like 'soft) DQN [14]', 'soft actor-critic (SAC) [13]', and 'rl-baselines3-zoo [32]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Default hyperparameters from [14, 13] work well, except for tuning the entropy regularization. Target networks were helpful for continuous control. We elaborate details in Appendix D. Hyperparameter settings and training details are detailed in Appendix D.