Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning

Authors: George Konidaris, Leslie Pack Kaelbling, Tomas Lozano-Perez

JAIR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We construct an agent that autonomously learns the correct abstract representation of a computer game domain, and rapidly solves it. Finally, we apply these techniques to create a physical robot system that autonomously learns its own symbolic representation of a mobile manipulation task directly from sensorimotor data point clouds, map locations, and joint angles and then plans using that representation.
Researcher Affiliation	Academia	George Konidaris EMAIL Brown University, Providence RI 02912 Duke University, Durham NC 27708 Leslie Pack Kaelbling EMAIL Tomas Lozano-Perez EMAIL MIT CSAIL, 32 Vassar Street Cambridge MA 02139. All listed affiliations are universities, and email domains (.edu, .mit.edu) confirm academic affiliations.
Pseudocode	Yes	Algorithm 1 formalizes the procedures described above in pseudo-code, proceeding in three stages: factor generation, symbol enumeration, and operator description generation.
Open Source Code	No	The paper uses several third-party open-source tools like ar_track_alvar, IAI Kinect2, MoveIt!, PCL, ROS, scikit-learn, WEKA, FF planner, and mGPT planner, citing their respective sources. However, there is no explicit statement from the authors providing their own source code for the methodology described in this paper, nor any direct link to a repository for their specific implementation.
Open Datasets	No	The paper describes experiments conducted in a 'continuous playroom domain' and a 'Treasure Game' which the authors used to generate their own data. For the robot manipulation task, they collected data from '167 motor skill executions'. While these datasets are used for their experiments, the paper does not provide concrete access information (link, DOI, repository, or formal citation for public access) for any of these generated datasets.
Dataset Splits	No	The paper describes data collection processes, such as 'gathered 5,000 positive and negative examples' for the playroom domain and 'total of 4000 option executions' for the Treasure Game. For the robot task, '167 motor skill executions' were used. However, it does not specify any explicit training, validation, or test splits for these datasets, nor does it refer to standard predefined splits from external benchmarks.
Hardware Specification	Yes	Results were obtained on a Mac Book Pro with a 2.5Ghz Intel Core i5 processor and 8GB of RAM. (Table 1 caption). Results were obtained on an i Mac with a 3.2Ghz Intel Core i5 processor and 16GB of RAM. (Table 3 caption). The experiment uses a robot named Anathema Device, an Adept Pioneer Mobile Manipulator. Anathema consists of an Adept Pioneer LX base and a fixed torso upon which is mounted a pair of Kinova Jaco-2 robot arms, and a pan-tilt head with a Kinect-2 RGBD sensor. (Section 5)
Software Dependencies	No	The paper mentions various software components such as 'PDDL', 'FF planner', 'WEKA toolkit (C4.5 decision tree)', 'scikit-learn toolkit (DBSCAN algorithm, support vector machine)', 'ROS', 'TRAC-IK', 'IAI Kinect2', 'Move It!', 'Point Cloud Library (PCL)', and 'm GPT planner'. While these are named, specific version numbers are generally not provided in the text, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	To obtain training data, we executed 100 randomly selected options sequentially, gathering one set of data that recorded whether each option could run at states observed before and after option execution, and another that recorded the transition data xi = (si, oi, ri, s i) for each executed option. This procedure was repeated 40 times, for a total of 4000 option executions. Clustering was performed using the DBSCAN algorithm (Ester, Kriegel, Sander, & Xu, 1996) in the scikit-learn toolkit (Pedregosa et al., 2011), with parameters min samples = 5 and ϵ = 0.4/14 (for partitioning the effects) or ϵ = 0.8/14 (for merging the start states). A support vector machine was used as a probabilistic precondition classifier... with an RBF kernel, automatic class reweighting, and parameters selected by a grid search with 3-fold cross-validation. Kernel density estimation (Rosenblatt, 1956; Parzen, 1962) was used to model each effect distribution, with a Gaussian kernel and parameters fit using a grid search and 3-fold cross-validation. Monte Carlo sampling (m = 100 samples). Rules estimated to be executable with a probability of less than 5%,were discarded, and those with an estimated probability of execution of greater than 95% were upgraded to certainty.