Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
GRAML: Goal Recognition As Metric Learning
Authors: Matan Shamir, Reuth Mirsky
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated on a versatile set of environments, GRAML shows speed, flexibility, and runtime improvements over the state-of-the-art GR while maintaining accurate recognition. |
| Researcher Affiliation | Academia | Matan Shamir1 , Reuth Mirsky1,2 1Computer Science Department, Bar-Ilan University, Israel 2Computer Science Department, Tufts University, MA, USA EMAIL, EMAIL |
| Pseudocode | No | The paper describes steps in regular paragraph text without structured formatting, and no figures are labeled as pseudocode or algorithm. |
| Open Source Code | Yes | 1https://github.com/Matan Shamir1/Grlib |
| Open Datasets | Yes | Building on the GCRL survey and the benchmark environments suggested at Apex RL 2, we form a collection of GR problems from several sets of environments that adhere to the Gymnasium API 3, with detailed descriptions of each in Appendix ??. We consider two custom Minigrid environments from the minigrid package [Chevalier-Boisvert et al., 2023], two custom Point Maze environments from the Gymnasium-Robotics package [Fu et al., 2020], the Parking environment from the highway-env package [Leurent, 2018], and the Reach environment from Panda Gym [Gallou edec et al., 2021]. |
| Dataset Splits | No | The paper mentions varying observation sequence lengths (30%, 50%, 70%, 100%) and generating '200 GR problems per scenario' but does not specify training, validation, or test dataset splits in a way that allows reproduction of data partitioning. |
| Hardware Specification | Yes | All experiments were conducted on a commodity Intel i-7 pro. |
| Software Dependencies | No | The paper mentions software like Python, PyTorch, Stable Baselines3, Gymnasium API, minigrid package, Gymnasium-Robotics package, highway-env package, and Panda Gym, but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | Each single-goal agent was trained for 300,000 timesteps, and the goal-conditioned agent was trained for 1 million timesteps. ... G was set to 20, while BG-GRAML used only 5. ... For each environment, we tested observation sequences that are 30%, 50%, 70%, and 100% of the full sequence, both consecutively and non-consecutively. |