reproducibilityindex.ai

Interactive Inverse Reinforcement Learning for Cooperative Games

Authors: Thomas Kleine Büning, Anne-Marie George, Christos Dimitrakakis

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments support our theoretical results and show that the interactive nature of our setting allows the learning agent to obtain a much better estimate of the reward function (compared to the standard IRL setting). We thus achieve better cooperation by intelligently probing the human s responses.
Researcher Affiliation	Academia	1Department of Informatics, University of Oslo, Oslo, Norway 2Department of Computer Science, University of Neuchatel, Neuchatel, Switzerland 3Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden.
Pseudocode	Yes	Algorithm 1 Interactive IRL via Linear Programming
Open Source Code	Yes	The code is available at https://github.com/Interactive IRL/src.
Open Datasets	No	The paper describes two custom environments: "Maze-Maker" and "Random MDPs" which were generated for the experiments. It does not provide access information (link, DOI, citation) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific details on training, validation, or test splits. It mentions using "repeatedly generating responses" and averaging results over multiple runs.
Hardware Specification	Yes	The experiments were carried out on a virtual machine with 32 CPUs, 60GB RAM, and Cent OS Linux 8 operating system.
Software Dependencies	Yes	The experiments were implemented in Python 3.7 and the libraries matplotlib 3.2.1, numpy 1.20.1, and scipy 1.6.2 (for the linear program) were used.
Experiment Setup	Yes	For the case of suboptimal responses and partial information, we assume that A2 responds with Boltzmann-rational policies with inverse temperature β = 10 in both environments. ... We let an episode end with probability 1 γ = 0.1 each time step... We impose a minimal trajectory length of 2 time steps to prevent vacuous episodes. ... We assume that any attempted move of the cart succeeds with probability 0.8 and that with probability 0.2 the cart moves to a random neighbouring cell.