Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

Authors: Jiangchuan Zheng, Siyuan Liu, Lionel M. Ni

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic data and real vehicle routing data with noticeable behavior noise show significant improvement of our method over previous approaches in learning accuracy, and also show its power in de-noising behavior data. We carry out comparative experiments on both synthetic grid world-based data and real-world vehicle routing data to show the accuracy of our method in reward and policy learning from noisy demonstration data, as well as its ability of de-noising behavior data. Figure 2 shows the comparison results.
Researcher Affiliation Academia Jiangchuan Zheng1, Siyuan Liu3, and Lionel M. Ni1,2 1Department of Computer Science and Engineering, 2Guangzhou HKUST Fok Ying Tung Research Institute, The Hong Kong University of Science and Technology, Hong Kong, China 3Heinz College, Carnegie Mellon University, Pittsburgh, USA jczheng@cse.ust.hk, siyuan@cmu.edu, ni@cse.ust.hk
Pseudocode Yes Algorithm 1: Robust Bayesian IRL (RBIRL)
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets No For synthetic data, the paper describes generating its own data: "For a particular grid world setting, we generate negative rewards drawn randomly from i.i.d. Gaussian priors N(0, 100)." For real data, it states: "We apply our framework to a real-world taxi trajectory data set collected from Shenzhen, China,". However, it does not provide concrete access information (link, DOI, formal citation) for this real-world dataset to be publicly available.
Dataset Splits No The paper mentions training and testing sets, for example, "with 15% containing no unoccupied trajectories withheld as the testing set." However, it does not specify a validation dataset split or a cross-validation setup.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications).
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup Yes For parameter setting, we set β2 to 1.5 and η to 0.5 after careful tuning.