Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations

Authors: Guanren Qiao, Guiliang Liu, Pascal Poupart, Zhiqiang Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in both discrete and continuous environments show that MMICRL outperforms other baselines in terms of constraint recovery and control performance.
Researcher Affiliation Academia Guanren Qiao1, Guiliang Liu 1, Pascal Poupart2,3, Zhiqiang Xu4 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, 2University of Waterloo, 3Vector Institute, 4Mohamed bin Zayed University of Artificial Intelligence
Pseudocode Yes Algorithm 1: Probing_Sets [...] Algorithm 2: Multi-Modal Inverse Constrained Reinforcement Learning (MMICRL)
Open Source Code Yes Our implementation is available at: https://github.com/qiaoguanren/Multi-Modal-Inverse-Constrained Reinforcement-Learning.
Open Datasets Yes The demonstration data are assumed to have a zero violation rate. We create a 7x7 Gridworld map and design four distinct constraint map settings. [...] To facilitate constraint inference, a demonstration dataset containing expert trajectories is provided for each environment [7]. [...] For extracting features of cars and roads, we use the features collector from Commonroad RL [39]. The constraint that we are interested in are 1) Car distance 20m (agent 0) and 2) Car distance 40m (agent 1). Figure 6 shows the distribution of car distance in expert demonstrations for agent 0 and agent 1.
Dataset Splits No The paper mentions 'demonstration data' and 'expert trajectories' but does not specify any training, validation, or test dataset splits or percentages.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software like 'Mu Jo Co' and 'Commonroad RL' but does not provide specific version numbers for these or other ancillary software components, such as programming languages or libraries.
Experiment Setup No Appendix A.2 reports the detailed settings. [The main text refers to the Appendix for detailed settings, but the Appendix is not provided, thus specific experimental setup details like hyperparameters are not in the main text.]