Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Adrian Weller, Volkan Cevher
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition dynamics mismatches in both finite and continuous MDP problems. |
| Researcher Affiliation | Collaboration | Luca Viano LIONS, EPFL Yu-Ting Huang EPFL Parameswaran Kamalaruban The Alan Turing Institute Adrian Weller University of Cambridge & The Alan Turing Institute Volkan Cevher LIONS, EPFL |
| Pseudocode | Yes | Algorithm 1 Robust MCE IRL via Markov Game |
| Open Source Code | Yes | Code Repository https://github.com/lviano/Robust_MCE_IRL/tree/master/robust_IRLcode |
| Open Datasets | No | The paper describes generating data within custom GRIDWORLD and OBJECTWORLD environments, rather than using a pre-existing publicly available dataset with a specific link or citation for access. It defines how the environments are set up but does not provide concrete access information for a dataset. |
| Dataset Splits | No | The paper specifies experimental parameters like noise levels (ϵL, ϵE) and algorithm parameter α, but it does not describe specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | We used an internal cluster with CPU nodes for the experiments; but we do not have an estimate of the total amount of compute. |
| Software Dependencies | No | The paper mentions using the 'deep MCE IRL algorithm from [44]' but does not provide specific software dependencies with version numbers (e.g., PyTorch version, Python version, specific library versions). |
| Experiment Setup | Yes | We have provided all the training and hyperparameters details in the Experiments section, and in the Appendix. In our experiments, we set T ref to be deterministic, and T to be uniform. Then, one can easily show that ddyn T L,ϵL, T E,ϵE = 2 1 1 |S| |ϵL ϵE|. ...our robust MCE IRL algorithm with different values of α {0.8, 0.85, 0.9, 0.95}... |