reproducibilityindex.ai

Compatible Reward Inverse Reinforcement Learning

Authors: Alberto Maria Metelli, Matteo Pirotta, Marcello Restelli

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Experimental results We evaluate CR-IRL against some popular IRL algorithms both in discrete and in continuous domains: the Taxi problem (discrete), the Linear Quadratic Gaussian and the Car on the Hill environments (continuous). We provide here the most signiﬁcant results, the full data are reported in App. D.
Researcher Affiliation	Academia	Alberto Maria Metelli DEIB Politecnico di Milano, Italy albertomaria.metelli@polimi.it Matteo Pirotta SequeL Team Inria Lille, France matteo.pirotta@inria.fr Marcello Restelli DEIB Politecnico di Milano, Italy marcello.restelli@polimi.it
Pseudocode	Yes	Alg 1: CR-IRL algorithm.
Open Source Code	No	The paper does not provide concrete access to source code. There are no links to repositories or explicit statements about code availability.
Open Datasets	Yes	The Taxi domain is deﬁned in [35]. The Linear Quadratic Gaussian regulator [36]. The continuous Car on the Hill domain [37].
Dataset Splits	No	No specific dataset split information (percentages, sample counts, or explicit cross-validation setup) was found. The paper mentions using a number of expert trajectories but not how they are partitioned for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	We assume the expert plays an ϵ-Boltzmann policy with ﬁxed ϵ: πθ,ϵ(a\|s) = (1 ϵ) eθT a ζs P a A eθa T ζs + ϵ \|A\|, where the policy features ζs are the following state features: current location, passenger location, destination location, whether the passenger has already been pick up. ... with a Gaussian policy with variance σ2 = 0.01. ... a noisy expert s policy in which a random action is selected with probability ϵ = 0.1. ... a radial basis function network