Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
Authors: Jiangchuan Zheng, Siyuan Liu, Lionel M. Ni
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic data and real vehicle routing data with noticeable behavior noise show significant improvement of our method over previous approaches in learning accuracy, and also show its power in de-noising behavior data. We carry out comparative experiments on both synthetic grid world-based data and real-world vehicle routing data to show the accuracy of our method in reward and policy learning from noisy demonstration data, as well as its ability of de-noising behavior data. Figure 2 shows the comparison results. |
| Researcher Affiliation | Academia | Jiangchuan Zheng1, Siyuan Liu3, and Lionel M. Ni1,2 1Department of Computer Science and Engineering, 2Guangzhou HKUST Fok Ying Tung Research Institute, The Hong Kong University of Science and Technology, Hong Kong, China 3Heinz College, Carnegie Mellon University, Pittsburgh, USA jczheng@cse.ust.hk, siyuan@cmu.edu, ni@cse.ust.hk |
| Pseudocode | Yes | Algorithm 1: Robust Bayesian IRL (RBIRL) |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | No | For synthetic data, the paper describes generating its own data: "For a particular grid world setting, we generate negative rewards drawn randomly from i.i.d. Gaussian priors N(0, 100)." For real data, it states: "We apply our framework to a real-world taxi trajectory data set collected from Shenzhen, China,". However, it does not provide concrete access information (link, DOI, formal citation) for this real-world dataset to be publicly available. |
| Dataset Splits | No | The paper mentions training and testing sets, for example, "with 15% containing no unoccupied trajectories withheld as the testing set." However, it does not specify a validation dataset split or a cross-validation setup. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | For parameter setting, we set β2 to 1.5 and η to 0.5 after careful tuning. |