reproducibilityindex.ai

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Authors: Shah Sanket, Arunesh Sinha, Pradeep Varakantham, Perrault Andrew, Milind Tambe2226-2235

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our solution allows us to signiﬁcantly reduce screenee wait time without compromising on the risk. <...> Finally, our third contribution is a set of experiments that reveal why and how prior TSG models fail to handle realistic continuous arrival of passengers in bursts. The experiments also show that our approach achieves the same risk as prior models but improves upon the average delay by 100% in the best case and 25% on average.
Researcher Affiliation	Academia	1School of Information Systems, Singapore Management University, {sankets, aruneshs, pradeepv}@smu.edu.sg 2Center for Research on Computation and Society, Harvard University, aperrault@g.harvard.edu, milind tambe@harvard.edu
Pseudocode	No	The paper describes algorithms and methods but does not provide formal pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	We construct our problem instances using the description in Brown et al. (2016) and Mc Carthy, Vayanos, and Tambe (2017). <...> We combine this with real ﬂight departure times taken from one of the busiest airports in the world to generate a realistic arrival distribution of passengers. The paper refers to other works for problem instance construction and mentions using "real flight departure times" but does not provide a link, DOI, or formal citation for accessing this specific dataset.
Dataset Splits	No	The paper mentions "training steps" and "convergence" of the DDPG actor network but does not specify details on training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	To perform a fair comparison to past work, we run all our experiments on a CPU. This is a general statement and does not provide specific CPU models, memory, or other detailed hardware specifications.
Software Dependencies	No	We use the Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al. 2015) algorithm that is a stateof-the-art technique in Deep Reinforcement Learning literature. <...> In practice, these gradients need not be explicitly calculated and can be handled by automatic symbolic differentiation libraries (Abadi et al. 2015) instead. The paper mentions DDPG and refers to TensorFlow (Abadi et al. 2015) but does not provide specific version numbers for any software dependencies.
Experiment Setup	No	We use the Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al. 2015) algorithm <...> We choose 10,000 training steps as the number of steps for convergence. The paper describes the RL algorithm used and mentions the number of training steps, but it lacks specific hyperparameter values (e.g., learning rate, batch size, network architecture details) required for full reproducibility of the experimental setup.