reproducibilityindex.ai

Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions

Authors: Sinong Geng, Houssam Nassif, Carlos Manzanares, Max Reppen, Ronnie Sircar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the performance of PQR is demonstrated by synthetic and real-world datasets.
Researcher Affiliation	Collaboration	1Department of Computer Science, Princeton University, Princeton, New Jersey, USA 2Amazon, Seattle, Washington, USA 3Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey, USA.
Pseudocode	Yes	Algorithm 1 Reward Estimation (RE), Algorithm 2 FQI-I, Algorithm 3 PQR Algorithm.
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	We combine public data from the Ofﬁcial Activation Guide1 for scheduled ﬂight information, with public data from the Bureau of Transportation Statistics2 for airline company information. 1https://www.oag.com/ 2https://www.transtats.bts.gov
Dataset Splits	No	The paper mentions 'training data' but does not provide specific details on training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit split files).
Hardware Specification	No	The Acknowledgements section mentions 'Amazon Web Services for providing computational resources' but does not specify any particular hardware details such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using 'deep neural networks' and specific algorithms like 'soft Q-learning' and 'fitted-Q-iteration', but it does not provide specific version numbers for any software libraries or dependencies used (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	For simulation, we build an MDP environment... We take δ = 0.9 and α = 1 and sovle the MDP with a deep energy-based policy by the soft Q-learning method in Haarnoja et al. (2017).