Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning

Authors: Riccardo Poiani, Curti Gabriele, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We designed an experiment that aims at visualizing the reduction of the feasible reward set. ... We have then run our algorithm for 20 times using = 0.1 and = 0.1, and computed the (empirical) theoretical upper bound on .
Researcher Affiliation Academia Riccardo Poiani DEIB, Politecnico di Milano riccardo.poiani@polimi.it Gabriele Curti DEIB, Politecnico di Milano gabriele.curti@mail.polimi.it Alberto Maria Metelli DEIB, Politecnico di Milano albertomaria.metelli@polimi.it Marcello Restelli DEIB, Politecnico di Milano marcello.restelli@polimi.it
Pseudocode Yes Algorithm 1 US-IRL-SE Algorithm
Open Source Code No The codebase to reproduce the results will be public.
Open Datasets Yes We considered as environment the forest management scenarios with 10 states and 2 actions that is available in the 'pymdptoolbox' library. ... Code can be found at https://github.com/sawcordwell/pymdptoolbox.
Dataset Splits No The paper does not provide explicit training/validation/test dataset splits. It describes experimental settings for the algorithm (e.g., epsilon and delta for PAC framework), but not data partitioning for model validation.
Hardware Specification Yes The experiments have been run on a laptop with 8 Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz and 8GB of RAM.
Software Dependencies Yes We considered as environment the forest management scenarios with 10 states and 2 actions that is available in the 'pymdptoolbox' library. ... Master branch at commit 7c96789cc80e280437005c12065cf70266c11636 was used.
Experiment Setup Yes We considered a discount factor γ = 0.9. ... We have then run our algorithm for 20 times using ϵ = 0.1 and δ = 0.1