Inverse Reinforcement Learning in Relational Domains

Authors: Thibaut Munzer, Bilal Piot, Matthieu Geist, Olivier Pietquin, Manuel Lopes

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the proposed approach, experiments have been run to (i) confirm RCSI can learn a relational reward from demonstrations, (ii) study the influence of the different parameters and (iii) show that IRL outperforms classification based imitation learning when dealing with transfer and changes in dynamics.
Researcher Affiliation Academia Thibaut Munzer Inria, Bordeaux, France thibaut.munzer@inria.fr Bilal Piot University Lille 1, Lille, France Matthieu Geist Supelec, Metz, France Olivier Pietquin University Lille 1, Lille, France Manuel Lopes Inria, Bordeaux, France manuel.lopes@inria.fr
Pseudocode No Figure 1: Sketch of the proposed method : CSI with reward shaping.
Open Source Code No No statement about open-source code for their method was found.
Open Datasets No From a target reward R , we compute an optimal policy π . The algorithm is given, as expert demonstrations, Nexpert trajectories starting from a random state and ending when the (first) wait action is selected. As random demonstrations, the algorithm is given Nrandom one-step trajectories starting from random states.
Dataset Splits No The setting Nrandom = 300 and Nexpert = 15 gives good results and so we will use it in the following experiments.
Hardware Specification No No specific hardware details were provided.
Software Dependencies No TBRIL... TILDE [Blockeel and De Raedt, 1998] is an algorithm designed to do classification and regression over relational data. It is a decision tree learner similar to C4.5 [Quinlan, 1993].
Experiment Setup Yes The main parameters are set as follows: 10 trees of maximum depth 4 are learned by TBRIL during the SBC step 1 and the reward is learned with a tree of depth 4, which acts as a regularization parameter.