Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning
Authors: Shah Sanket, Arunesh Sinha, Pradeep Varakantham, Perrault Andrew, Milind Tambe2226-2235
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk. <...> Finally, our third contribution is a set of experiments that reveal why and how prior TSG models fail to handle realistic continuous arrival of passengers in bursts. The experiments also show that our approach achieves the same risk as prior models but improves upon the average delay by 100% in the best case and 25% on average. |
| Researcher Affiliation | Academia | 1School of Information Systems, Singapore Management University, {sankets, aruneshs, pradeepv}@smu.edu.sg 2Center for Research on Computation and Society, Harvard University, aperrault@g.harvard.edu, milind tambe@harvard.edu |
| Pseudocode | No | The paper describes algorithms and methods but does not provide formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | We construct our problem instances using the description in Brown et al. (2016) and Mc Carthy, Vayanos, and Tambe (2017). <...> We combine this with real flight departure times taken from one of the busiest airports in the world to generate a realistic arrival distribution of passengers. The paper refers to other works for problem instance construction and mentions using "real flight departure times" but does not provide a link, DOI, or formal citation for accessing this specific dataset. |
| Dataset Splits | No | The paper mentions "training steps" and "convergence" of the DDPG actor network but does not specify details on training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | To perform a fair comparison to past work, we run all our experiments on a CPU. This is a general statement and does not provide specific CPU models, memory, or other detailed hardware specifications. |
| Software Dependencies | No | We use the Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al. 2015) algorithm that is a stateof-the-art technique in Deep Reinforcement Learning literature. <...> In practice, these gradients need not be explicitly calculated and can be handled by automatic symbolic differentiation libraries (Abadi et al. 2015) instead. The paper mentions DDPG and refers to TensorFlow (Abadi et al. 2015) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | No | We use the Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al. 2015) algorithm <...> We choose 10,000 training steps as the number of steps for convergence. The paper describes the RL algorithm used and mentions the number of training steps, but it lacks specific hyperparameter values (e.g., learning rate, batch size, network architecture details) required for full reproducibility of the experimental setup. |