reproducibilityindex.ai

Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning

Authors: Riccardo Poiani, Curti Gabriele, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We designed an experiment that aims at visualizing the reduction of the feasible reward set. ... We have then run our algorithm for 20 times using = 0.1 and = 0.1, and computed the (empirical) theoretical upper bound on .
Researcher Affiliation	Academia	Riccardo Poiani DEIB, Politecnico di Milano riccardo.poiani@polimi.it Gabriele Curti DEIB, Politecnico di Milano gabriele.curti@mail.polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano marcello.restelli@polimi.it
Pseudocode	Yes	Algorithm 1 US-IRL-SE Algorithm
Open Source Code	No	The codebase to reproduce the results will be public.
Open Datasets	Yes	We considered as environment the forest management scenarios with 10 states and 2 actions that is available in the 'pymdptoolbox' library. ... Code can be found at https://github.com/sawcordwell/pymdptoolbox.
Dataset Splits	No	The paper does not provide explicit training/validation/test dataset splits. It describes experimental settings for the algorithm (e.g., epsilon and delta for PAC framework), but not data partitioning for model validation.
Hardware Specification	Yes	The experiments have been run on a laptop with 8 Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz and 8GB of RAM.
Software Dependencies	Yes	We considered as environment the forest management scenarios with 10 states and 2 actions that is available in the 'pymdptoolbox' library. ... Master branch at commit 7c96789cc80e280437005c12065cf70266c11636 was used.
Experiment Setup	Yes	We considered a discount factor γ = 0.9. ... We have then run our algorithm for 20 times using ϵ = 0.1 and δ = 0.1