Goal Recognition as Reinforcement Learning
Authors: Leonardo Amado, Reuth Mirsky, Felipe Meneguzzi9644-9651
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a first instance of this framework using tabular Q-learning for the learning stage, as well as three mechanisms for the inference stage. The resulting instantiation achieves state-of-the-art performance against goal recognizers on standard evaluation domains and superior performance in noisy environments. Evaluation of the new framework on domains with partial and noisy observability. Experiments show that even with a very short learning process, we can still accurately and robustly perform GR on challenging problems. We use standard machine learning metrics in our evaluation: accuracy, precision, recall, and F-score. |
| Researcher Affiliation | Academia | Leonardo Amado1, Reuth Mirsky, 2, Felipe Meneguzzi 3,1 1 Pontif ıcia Universidade Cat olica do Rio Grande do Sul, Brazil 2 Bar Ilan University, Israel and The University of Texas at Austin, USA 3 University of Aberdeen, Scotland |
| Pseudocode | Yes | Algorithm 1: Learn a Q-function for each goal; Algorithm 2: Infer most likely goal for the observations |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that their own source code is open or available. |
| Open Datasets | Yes | To be able to compare GRAQL and planning-based GR, we use PDDLGym (Silver and Chitnis 2020) as our evaluation environment. |
| Dataset Splits | No | The paper describes how it generates its test problems, including variants with different observability and noise levels, but it does not specify explicit training/validation/test dataset splits with percentages or counts for reproducibility in the conventional machine learning sense. The Q-learning process is described in terms of 'episodes' rather than dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions specific algorithms and tools (e.g., 'off-the-shelf model-free Q-learning algorithm (Sutton 1988)', 'PDDLGym (Silver and Chitnis 2020)', 'LAMA (Richter and Westphal 2010)', 'Fast Downward (Helmert 2006)') but does not specify version numbers for these or other software dependencies, which would be required for reproducible setup. |
| Experiment Setup | Yes | For the learning stage of our experiments, we use an off-the-shelf model-free Q-learning algorithm (Sutton 1988). For each goal, we run the learner for a fixed number of episodes, whether it reaches convergence or not. We evaluate greedy policy executions after training the Q-functions for 500, 10k and 30k episodes. Since the performance of these training regimes does not vary much, we report our empirical results with a consistent value of 500 training episodes. We set the reward for reaching the goal to 100, and 0 otherwise, and the discount factor to 0.9. As exploration is more important in this case than maximizing the reward, the sampling strategy we use is ϵ-greedy with linearly decaying values (ϵ = 1 . . . 0.01). For the DP measure, we use a threshold probability for divergence of δ = 0.1. |