Multi-task Hierarchical Adversarial Inverse Reinforcement Learning

Authors: Jiayu Chen, Dipesh Tamboli, Tian Lan, Vaneet Aggarwal

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Evaluation and Main Results MH-AIRL is proposed to learn a multi-task hierarchical policy from a mixture of (unstructured) expert demonstrations. The learned policy can be adopted to any task sampled from a distribution of tasks. In this section: (1) We provide an ablation study with respect to the three main components of our algorithm: context-based multi-task/meta learning, option/hierarchical learning, and imitation learning. (2) We show that the hierarchical policy learning can significantly improve the agent s performance on challenging long-horizontal tasks. (3) Through qualitative and quantitative results, we show that our algorithm can capture the subtask structure within the expert demonstrations and that the learned basic skills for the subtasks (i.e., options) can be transferred to tasks not within the task distribution to aid learning, for better transferability.
Researcher Affiliation Academia 1School of Industrial Engineering, Purdue University, West Lafayette, IN 47907, USA 2Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA 3Department of Electrical and Computer Engineering, George Washington University, Washington DC 20052, USA 4Department of Computer Science and AI Initiative, King Abdullah University of Science and Technology, Thuwal 23955, KSA.
Pseudocode Yes Algorithm 1 Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL)
Open Source Code Yes Codes for reproducing all the results are on https://github.com/Lucas CJYSDL/Multi-task Hierarchical-AIRL.
Open Datasets Yes The evaluation is based on three Mujoco (Todorov et al., 2012) locomotion tasks and the Kitchen task from the D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper states, 'All the algorithms are trained with the same expert data, and evaluated on the same set of test tasks (not contained in the demonstrations)', but it does not provide specific percentages or sample counts for training, validation, and test splits, nor does it refer to standard predefined splits for the datasets used (Mujoco, D4RL) to ensure reproducibility of the data partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. It only mentions the simulation environments like Mujoco and D4RL.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow versions), that would be needed to replicate the experiment environment.
Experiment Setup Yes After careful finetuning, we established a training iteration ratio of 1:3:10 for the discriminator, hierarchical policy, and variational posteriors, respectively. Despite this complexity, our evaluations across a wide variety of tasks utilized a consistent set of hyperparameters, showing the robustness of our approach.