An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
Authors: Dhruv Malik, Malayandi Palaniappan, Jaime Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that our method helps scale POMDP solvers to CIRL games with larger reward parameter and action spaces. We find a speedup of several orders of magnitude for exact methods, and substantial improvements in value for approximate methods. (Section 1, Contributions point 2) and 6. Experiments. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 Adapted Value Iteration for CIRL Games (Section 3.3) |
| Open Source Code | No | The paper describes algorithms and pseudocode, and references appendices for adapted PBVI and POMCP algorithms, but does not provide a concrete access link or explicit statement of public source code release in the main text. |
| Open Datasets | No | Domain Our experimental domain is based on our running example from Section 1. Assume there are m recipes and n ingredients. The state space is an n-tuple representing the quantity of each ingredient prepared thus far. (Section 6.1). No concrete access information or citation to a public dataset is provided. |
| Dataset Splits | No | The paper mentions 'training R' and '30,000 samples' but does not specify explicit dataset splits (e.g., percentages or counts for train/validation/test sets). |
| Hardware Specification | No | For the simpler problems... exact VI failed to solve the problem after depleting our system s 16GB memory (Section 6.2, Exact VI). This mentions memory but no specific CPU/GPU models or detailed hardware setup. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes 'Manipulated Variables' and 'Dependent Measures' for experiments and mentions running POMCP with '30,000 samples' or '500,000 samples', but it lacks specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) or detailed model initialization/training schedules. |