Maximum Entropy Semi-Supervised Inverse Reinforcement Learning

Authors: Julien Audiffren, Michal Valko, Alessandro Lazaric, Mohammad Ghavamzadeh

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance of Max Ent-IRL.
Researcher Affiliation Collaboration CMLA UMR 8536 Seque L team Seque L team Adobe Research & ENS Cachan INRIA Lille INRIA Lille INRIA Lille
Pseudocode Yes Algorithm 1 MESSI Max Ent SSIRL
Open Source Code No The paper does not provide a direct link to the source code for the methodology described, nor does it state that the code is released or available in supplementary materials.
Open Datasets No The paper mentions using 'expert trajectories' and 'unsupervised trajectories' generated from distributions like Pμ1, Pμ2, Pμ3, but it does not specify concrete access information (e.g., a link, DOI, or formal citation for a publicly available dataset) for the data used in the experiments. It describes the characteristics of the generated data rather than providing access to a pre-existing dataset.
Dataset Splits No The paper refers to 'l expert trajectories' and 'u unsupervised trajectories' and discusses using these in the learning process, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) typically used for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes Parameters. For each of the experiments, the default parameters are θmax = 500, λ0 = 0.05, the number of iterations of gradient descent is set to T = 100, one expert trajectory is provided (l = 1), and the number of unsupervised trajectories is set to u = 20 with ν = 0.5.