Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Relational Reinforcement Learning for Planning with Exogenous Effects

Authors: David Martínez, Guillem Alenyà, Tony Ribeiro, Katsumi Inoue, Carme Torras

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table.
Researcher Affiliation	Academia	David Martínez 1 EMAIL Guillem Alenyá 1 EMAIL Tony Ribeiro 2 EMAIL Katsumi Inoue 3 EMAIL Carme Torras 1 EMAIL 1 Institut de Robótica i Informática Industrial (CSIC-UPC), Barcelona, Spain 2 Laboratoire des sciences du numérique de Nantes (LS2N), Nantes, France 3 National Institute of Informatics, Tokyo, Japan
Pseudocode	Yes	Algorithm 1 Probabilistic LFIT(E, B) Algorithm 2 Operator Selection(Oinput, T) Algorithm 3 Operator Selection Subsumption(Tree O, T) Algorithm 4 V-MIN
Open Source Code	No	The paper does not provide a direct link to source code, an explicit statement of code release, or mention code in supplementary materials for the methodology described in this paper.
Open Datasets	Yes	Three IPPC 2014 domains were used in the experiments. Note that they were slightly modiﬁed to remove redundancy (e.g. a north(?X,?Y) literal is equivalent to south(?Y,?X), so one can be replaced by the other).
Dataset Splits	No	The paper describes how input transitions for model learning are generated randomly (e.g., "the state s is constructed by randomly assigning a value (positive or negative) to every literal"), and that the RL approach generates data through interaction (episodes and runs). It does not specify explicit training/test/validation splits for a fixed dataset in the traditional sense.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	PROST (Keller and Eyerich, 2012) is the planner used as it can obtain good results with probabilistic models containing exogenous eﬀects.
Experiment Setup	Yes	The learner parameters used were α = 0.01, ϵ = 0.1, ω = 2, δ = 0.05, κ = 1000, and the subsumption was enabled. The V-MIN exploration threshold was ζ = 3 and Vmin was selected and updated by the teacher depending on the robot performance.