Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning sparse relational transition models
Authors: Victoria Xia, Zi Wang, Kelsey Allen, Tom Silver, Leslie Pack Kaelbling
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS We apply our approach, SPARE, to a challenging problem of predicting pushing stacks of blocks on a cluttered table top. We describe our domain, the baseline that we compare to and report our results. Figure 4: (a) In a simple 3-block pushing problem instance, data likelihood and learned default standard deviation both improve as more deictic references are added. (b) Comparing performance as a function of number of distractors with a fixed amount of training data. (c) Comparing sample efficiency of SPARE to the baselines. Shaded regions represent 95% confidence interval. |
| Researcher Affiliation | Academia | Victoria Xia Zi Wang Kelsey Allen Tom Silver Leslie Pack Kaelbling Equal contribution. Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139. EMAIL. |
| Pseudocode | Yes | Algorithm 1 Greedy procedure for constructing Γ. |
| Open Source Code | No | The paper does not include any explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | We simulate this 3D domain using the physically realistic Py Bullet (Coumans & Bai, 2016 2018) simulator. The paper describes generating data by sampling problem instances and action parameters, rather than using a pre-existing publicly available dataset with concrete access information. |
| Dataset Splits | Yes | We held out 20% of the training data as the validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models, memory, or specific computing environments) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Keras' and 'Adam optimizer' without specific version numbers. It also refers to 'PyBullet (Coumans & Bai, 2016 2018)', which is a range and not a single, specific version number for reproducibility. |
| Experiment Setup | Yes | Predictors for the templates approach were trained for 1000 epochs each with a decaying learning rate starting at 1e-2 and decreasing by a factor of 0.6 every 100 epochs. The GNN was trained using a decaying learning rate starting at 1e-2, and decreasing by a factor of 0.5 every 100 epochs. A total of 900 epochs were used. |