Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Relational Reinforcement Learning for Planning with Exogenous Effects
Authors: David Martínez, Guillem Alenyà, Tony Ribeiro, Katsumi Inoue, Carme Torras
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table. |
| Researcher Affiliation | Academia | David Martínez 1 EMAIL Guillem Alenyá 1 EMAIL Tony Ribeiro 2 EMAIL Katsumi Inoue 3 EMAIL Carme Torras 1 EMAIL 1 Institut de Robótica i Informática Industrial (CSIC-UPC), Barcelona, Spain 2 Laboratoire des sciences du numérique de Nantes (LS2N), Nantes, France 3 National Institute of Informatics, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1 Probabilistic LFIT(E, B) Algorithm 2 Operator Selection(Oinput, T) Algorithm 3 Operator Selection Subsumption(Tree O, T) Algorithm 4 V-MIN |
| Open Source Code | No | The paper does not provide a direct link to source code, an explicit statement of code release, or mention code in supplementary materials for the methodology described in this paper. |
| Open Datasets | Yes | Three IPPC 2014 domains were used in the experiments. Note that they were slightly modified to remove redundancy (e.g. a north(?X,?Y) literal is equivalent to south(?Y,?X), so one can be replaced by the other). |
| Dataset Splits | No | The paper describes how input transitions for model learning are generated randomly (e.g., "the state s is constructed by randomly assigning a value (positive or negative) to every literal"), and that the RL approach generates data through interaction (episodes and runs). It does not specify explicit training/test/validation splits for a fixed dataset in the traditional sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | PROST (Keller and Eyerich, 2012) is the planner used as it can obtain good results with probabilistic models containing exogenous effects. |
| Experiment Setup | Yes | The learner parameters used were α = 0.01, ϵ = 0.1, ω = 2, δ = 0.05, κ = 1000, and the subsumption was enabled. The V-MIN exploration threshold was ζ = 3 and Vmin was selected and updated by the teacher depending on the robot performance. |