Compatible Reward Inverse Reinforcement Learning

Authors: Alberto Maria Metelli, Matteo Pirotta, Marcello Restelli

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7 Experimental results We evaluate CR-IRL against some popular IRL algorithms both in discrete and in continuous domains: the Taxi problem (discrete), the Linear Quadratic Gaussian and the Car on the Hill environments (continuous). We provide here the most significant results, the full data are reported in App. D.
Researcher Affiliation Academia Alberto Maria Metelli DEIB Politecnico di Milano, Italy albertomaria.metelli@polimi.it Matteo Pirotta SequeL Team Inria Lille, France matteo.pirotta@inria.fr Marcello Restelli DEIB Politecnico di Milano, Italy marcello.restelli@polimi.it
Pseudocode Yes Alg 1: CR-IRL algorithm.
Open Source Code No The paper does not provide concrete access to source code. There are no links to repositories or explicit statements about code availability.
Open Datasets Yes The Taxi domain is defined in [35]. The Linear Quadratic Gaussian regulator [36]. The continuous Car on the Hill domain [37].
Dataset Splits No No specific dataset split information (percentages, sample counts, or explicit cross-validation setup) was found. The paper mentions using a number of expert trajectories but not how they are partitioned for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We assume the expert plays an ϵ-Boltzmann policy with fixed ϵ: πθ,ϵ(a|s) = (1 ϵ) eθT a ζs P a A eθa T ζs + ϵ |A|, where the policy features ζs are the following state features: current location, passenger location, destination location, whether the passenger has already been pick up. ... with a Gaussian policy with variance σ2 = 0.01. ... a noisy expert s policy in which a random action is selected with probability ϵ = 0.1. ... a radial basis function network