Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions
Authors: Sinong Geng, Houssam Nassif, Carlos Manzanares, Max Reppen, Ronnie Sircar
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the performance of PQR is demonstrated by synthetic and real-world datasets. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Princeton University, Princeton, New Jersey, USA 2Amazon, Seattle, Washington, USA 3Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey, USA. |
| Pseudocode | Yes | Algorithm 1 Reward Estimation (RE), Algorithm 2 FQI-I, Algorithm 3 PQR Algorithm. |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | Yes | We combine public data from the Official Activation Guide1 for scheduled flight information, with public data from the Bureau of Transportation Statistics2 for airline company information. 1https://www.oag.com/ 2https://www.transtats.bts.gov |
| Dataset Splits | No | The paper mentions 'training data' but does not provide specific details on training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit split files). |
| Hardware Specification | No | The Acknowledgements section mentions 'Amazon Web Services for providing computational resources' but does not specify any particular hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using 'deep neural networks' and specific algorithms like 'soft Q-learning' and 'fitted-Q-iteration', but it does not provide specific version numbers for any software libraries or dependencies used (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | For simulation, we build an MDP environment... We take δ = 0.9 and α = 1 and sovle the MDP with a deep energy-based policy by the soft Q-learning method in Haarnoja et al. (2017). |