Compatible Reward Inverse Reinforcement Learning
Authors: Alberto Maria Metelli, Matteo Pirotta, Marcello Restelli
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Experimental results We evaluate CR-IRL against some popular IRL algorithms both in discrete and in continuous domains: the Taxi problem (discrete), the Linear Quadratic Gaussian and the Car on the Hill environments (continuous). We provide here the most significant results, the full data are reported in App. D. |
| Researcher Affiliation | Academia | Alberto Maria Metelli DEIB Politecnico di Milano, Italy albertomaria.metelli@polimi.it Matteo Pirotta SequeL Team Inria Lille, France matteo.pirotta@inria.fr Marcello Restelli DEIB Politecnico di Milano, Italy marcello.restelli@polimi.it |
| Pseudocode | Yes | Alg 1: CR-IRL algorithm. |
| Open Source Code | No | The paper does not provide concrete access to source code. There are no links to repositories or explicit statements about code availability. |
| Open Datasets | Yes | The Taxi domain is defined in [35]. The Linear Quadratic Gaussian regulator [36]. The continuous Car on the Hill domain [37]. |
| Dataset Splits | No | No specific dataset split information (percentages, sample counts, or explicit cross-validation setup) was found. The paper mentions using a number of expert trajectories but not how they are partitioned for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We assume the expert plays an ϵ-Boltzmann policy with fixed ϵ: πθ,ϵ(a|s) = (1 ϵ) eθT a ζs P a A eθa T ζs + ϵ |A|, where the policy features ζs are the following state features: current location, passenger location, destination location, whether the passenger has already been pick up. ... with a Gaussian policy with variance σ2 = 0.01. ... a noisy expert s policy in which a random action is selected with probability ϵ = 0.1. ... a radial basis function network |