IQ-Learn: Inverse soft-Q Learning for Imitation
Authors: Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare IQ-Learn ("IQ") to prior work on a diverse collection of RL tasks and environments ranging from low-dimensional control tasks: Cart Pole, Acrobot, Lunar Lander to more challenging continuous control Mu Jo Co tasks: Half Cheetah, Hopper, Walker and Ant. Furthermore, we test on the visually challenging Atari Suite with high-dimensional image inputs. We compare on offline IL with no access to the the environment while training, and online IL with environment access. |
| Researcher Affiliation | Academia | Divyansh Garg Shuvam Chakraborty Chris Cundy Jiaming Song Stefano Ermon Stanford University {divgarg, shuvamc, cundy, tsong, ermon}@stanford.edu |
| Pseudocode | Yes | Pseudocode in Algorithm 1, shows our Q-learning and actor-critic variants, with differences with conventional RL algorithms in red (we optimize -J to use gradient descent). We can implement our algorithm IQ-Learn in 15 lines of code on top of standard implementations of (soft) DQN [14] for discrete control or soft actor-critic (SAC) [13] for continuous control, with a change on the objective for the Q-function. |
| Open Source Code | No | The paper mentions 'Default hyperparameters from [14, 13] work well', referring to existing works for implementation details, and cites 'Antonin Raffin. Rl baselines3 zoo. https://github.com/DLR-RM/rl-baselines3-zoo' as a reference, which is a third-party repository. There is no explicit statement about IQ-Learn's own source code being released or a direct link to it. |
| Open Datasets | Yes | We compare IQ-Learn ("IQ") to prior work on a diverse collection of RL tasks and environments ranging from low-dimensional control tasks: Cart Pole, Acrobot, Lunar Lander to more challenging continuous control Mu Jo Co tasks: Half Cheetah, Hopper, Walker and Ant. Furthermore, we test on the visually challenging Atari Suite with high-dimensional image inputs. |
| Dataset Splits | No | The paper mentions 'tuning the entropy regularization' but does not explicitly provide details on train/validation/test dataset splits, specific percentages, or how data was partitioned for validation purposes. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like 'soft) DQN [14]', 'soft actor-critic (SAC) [13]', and 'rl-baselines3-zoo [32]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Default hyperparameters from [14, 13] work well, except for tuning the entropy regularization. Target networks were helpful for continuous control. We elaborate details in Appendix D. Hyperparameter settings and training details are detailed in Appendix D. |