Maximum Entropy Semi-Supervised Inverse Reinforcement Learning
Authors: Julien Audiffren, Michal Valko, Alessandro Lazaric, Mohammad Ghavamzadeh
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance of Max Ent-IRL. |
| Researcher Affiliation | Collaboration | CMLA UMR 8536 Seque L team Seque L team Adobe Research & ENS Cachan INRIA Lille INRIA Lille INRIA Lille |
| Pseudocode | Yes | Algorithm 1 MESSI Max Ent SSIRL |
| Open Source Code | No | The paper does not provide a direct link to the source code for the methodology described, nor does it state that the code is released or available in supplementary materials. |
| Open Datasets | No | The paper mentions using 'expert trajectories' and 'unsupervised trajectories' generated from distributions like Pμ1, Pμ2, Pμ3, but it does not specify concrete access information (e.g., a link, DOI, or formal citation for a publicly available dataset) for the data used in the experiments. It describes the characteristics of the generated data rather than providing access to a pre-existing dataset. |
| Dataset Splits | No | The paper refers to 'l expert trajectories' and 'u unsupervised trajectories' and discusses using these in the learning process, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) typically used for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Parameters. For each of the experiments, the default parameters are θmax = 500, λ0 = 0.05, the number of iterations of gradient descent is set to T = 100, one expert trajectory is provided (l = 1), and the number of unsupervised trajectories is set to u = 20 with ν = 0.5. |