Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning

Authors: Riccardo Poiani, Curti Gabriele, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We designed an experiment that aims at visualizing the reduction of the feasible reward set. ... We have then run our algorithm for 20 times using = 0.1 and = 0.1, and computed the (empirical) theoretical upper bound on .
Researcher Affiliation Academia Riccardo Poiani DEIB, Politecnico di Milano EMAIL Gabriele Curti DEIB, Politecnico di Milano EMAIL Alberto Maria Metelli DEIB, Politecnico di Milano EMAIL Marcello Restelli DEIB, Politecnico di Milano EMAIL
Pseudocode Yes Algorithm 1 US-IRL-SE Algorithm
Open Source Code No The codebase to reproduce the results will be public.
Open Datasets Yes We considered as environment the forest management scenarios with 10 states and 2 actions that is available in the 'pymdptoolbox' library. ... Code can be found at https://github.com/sawcordwell/pymdptoolbox.
Dataset Splits No The paper does not provide explicit training/validation/test dataset splits. It describes experimental settings for the algorithm (e.g., epsilon and delta for PAC framework), but not data partitioning for model validation.
Hardware Specification Yes The experiments have been run on a laptop with 8 Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz and 8GB of RAM.
Software Dependencies Yes We considered as environment the forest management scenarios with 10 states and 2 actions that is available in the 'pymdptoolbox' library. ... Master branch at commit 7c96789cc80e280437005c12065cf70266c11636 was used.
Experiment Setup Yes We considered a discount factor γ = 0.9. ... We have then run our algorithm for 20 times using ϵ = 0.1 and δ = 0.1