Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems

Authors: Shicheng Liu, Minghui Zhu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations are done to validate the proposed algorithm. (Abstract); This section presents two simulation examples. (Section 6)
Researcher Affiliation Academia School of Electrical Engineering and Computer Science Pennsylvania State University University Park, PA 16802, USA {sfl5539,muz16}@psu.edu
Pseudocode Yes Algorithm 1 MEML D-ICRL; Algorithm 2 Inner process
Open Source Code Yes We also include the code in the supplementary materials.
Open Datasets No The first example uses a grid world introduced in [11], but the demonstration data is generated by the authors. For the second example, the authors state “We first control the simulated drones to their target doors, record nine pairs of trajectories, and distribute four and five pairs to two learners respectively,” indicating a custom dataset with no explicit public access.
Dataset Splits No The paper mentions distributing “10, 20, 30, 40 demonstrated trajectories” to learners and a “total 100 demonstrated trajectories” for baselines, but does not specify explicit train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification No The main paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments, although it mentions in the ethics review that “The compute type is included in the Appendix.”
Software Dependencies No The paper mentions using “Gazebo” for simulation but does not provide specific software dependencies with version numbers (e.g., library names with versions) needed to replicate the experiment.
Experiment Setup No The paper states that “The detailed simulation setup is included in the Appendix” and “We include some details in Section 6 and the rest details are included in the Appendix,” but the main text itself does not provide specific experimental setup details such as hyperparameter values or training configurations.