reproducibilityindex.ai

Hybrid Inverse Reinforcement Learning

Authors: Juntao Ren, Gokul Swamy, Steven Wu, Drew Bagnell, Sanjiban Choudhury

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find that our approaches are significantly more sample efficient than standard inverse RL and several other baselines on a suite of continuous control tasks.
Researcher Affiliation	Collaboration	1Cornell University 2Carnegie Mellon University 3Aurora Innovation.
Pseudocode	Yes	Algorithm 1 (Dual) IRL ( Ziebart et al. (2008b)), Algorithm 2 Hybrid Policy Emulation (Hy PE), Algorithm 3 Hybrid RL (Hy RL), Algorithm 4 Hybrid Policy Emulation w/ Resets (Hy PER)
Open Source Code	Yes	We release the code we used for all of our experiments at https://github.com/jren03/garage.
Open Datasets	Yes	On the Mu Jo Co locomotion benchmark environments (Brockman et al., 2016)... and Our next set of experiments consider the D4RL (Fu et al., 2020) antmaze-large environments
Dataset Splits	No	The paper mentions 'validation data' in algorithms 2 and 3 ('Return best of π1:T on validation data.' and 'Return Best of π1:N, Q1:N on validation data.') implying its use, but it does not specify the explicit split percentages, sample counts, or methodology for creating the validation dataset from the primary datasets used (MuJoCo or D4RL).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper mentions several software components and libraries, such as Optimistic Adam, Soft Actor Critic (Haarnoja et al., 2018) implemented by Raffin et al. (2019), and TD3+BC (Fujimoto & Gu, 2021), but it does not provide specific version numbers for these software dependencies or the underlying frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We implement Hy PE by updating the policy and critic networks in Soft Actor Critic (Haarnoja et al., 2018) with expert and learner samples. We implement Hy PER by running model-based policy optimization (Janner et al., 2019) and resetting to expert states in the learned model. No reward information is provided in either case, so we also train a discriminator network. Appendix B includes additional implementation details and hyperparameters.