Inverse Reinforcement Learning in Relational Domains
Authors: Thibaut Munzer, Bilal Piot, Matthieu Geist, Olivier Pietquin, Manuel Lopes
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the proposed approach, experiments have been run to (i) confirm RCSI can learn a relational reward from demonstrations, (ii) study the influence of the different parameters and (iii) show that IRL outperforms classification based imitation learning when dealing with transfer and changes in dynamics. |
| Researcher Affiliation | Academia | Thibaut Munzer Inria, Bordeaux, France thibaut.munzer@inria.fr Bilal Piot University Lille 1, Lille, France Matthieu Geist Supelec, Metz, France Olivier Pietquin University Lille 1, Lille, France Manuel Lopes Inria, Bordeaux, France manuel.lopes@inria.fr |
| Pseudocode | No | Figure 1: Sketch of the proposed method : CSI with reward shaping. |
| Open Source Code | No | No statement about open-source code for their method was found. |
| Open Datasets | No | From a target reward R , we compute an optimal policy π . The algorithm is given, as expert demonstrations, Nexpert trajectories starting from a random state and ending when the (first) wait action is selected. As random demonstrations, the algorithm is given Nrandom one-step trajectories starting from random states. |
| Dataset Splits | No | The setting Nrandom = 300 and Nexpert = 15 gives good results and so we will use it in the following experiments. |
| Hardware Specification | No | No specific hardware details were provided. |
| Software Dependencies | No | TBRIL... TILDE [Blockeel and De Raedt, 1998] is an algorithm designed to do classification and regression over relational data. It is a decision tree learner similar to C4.5 [Quinlan, 1993]. |
| Experiment Setup | Yes | The main parameters are set as follows: 10 trees of maximum depth 4 are learned by TBRIL during the SBC step 1 and the reward is learned with a tree of depth 4, which acts as a regularization parameter. |