Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
Authors: Guanren Qiao, Guiliang Liu, Pascal Poupart, Zhiqiang Xu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in both discrete and continuous environments show that MMICRL outperforms other baselines in terms of constraint recovery and control performance. |
| Researcher Affiliation | Academia | Guanren Qiao1, Guiliang Liu 1, Pascal Poupart2,3, Zhiqiang Xu4 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, 2University of Waterloo, 3Vector Institute, 4Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | Yes | Algorithm 1: Probing_Sets [...] Algorithm 2: Multi-Modal Inverse Constrained Reinforcement Learning (MMICRL) |
| Open Source Code | Yes | Our implementation is available at: https://github.com/qiaoguanren/Multi-Modal-Inverse-Constrained Reinforcement-Learning. |
| Open Datasets | Yes | The demonstration data are assumed to have a zero violation rate. We create a 7x7 Gridworld map and design four distinct constraint map settings. [...] To facilitate constraint inference, a demonstration dataset containing expert trajectories is provided for each environment [7]. [...] For extracting features of cars and roads, we use the features collector from Commonroad RL [39]. The constraint that we are interested in are 1) Car distance 20m (agent 0) and 2) Car distance 40m (agent 1). Figure 6 shows the distribution of car distance in expert demonstrations for agent 0 and agent 1. |
| Dataset Splits | No | The paper mentions 'demonstration data' and 'expert trajectories' but does not specify any training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like 'Mu Jo Co' and 'Commonroad RL' but does not provide specific version numbers for these or other ancillary software components, such as programming languages or libraries. |
| Experiment Setup | No | Appendix A.2 reports the detailed settings. [The main text refers to the Appendix for detailed settings, but the Appendix is not provided, thus specific experimental setup details like hyperparameters are not in the main text.] |