Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification
Authors: Joar Max Viktor Skalse, Alessandro Abate
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g. the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function. |
| Researcher Affiliation | Academia | Joar Skalse & Alessandro Abate Department of Computer Science Oxford University |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing code or links to repositories. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical analysis; it does not report on experiments with datasets, so no training data is mentioned as publicly available. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical analysis; it does not report on experiments with datasets, so no validation data splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or hardware used for computation. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical analysis; it does not mention any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical analysis; it does not include details about an experimental setup, such as hyperparameters or training configurations. |