Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning

Authors: Aviv Netanyahu, Tianmin Shu, Joshua Tenenbaum, Pulkit Agrawal

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments with simulated oracles and with human subjects.
Researcher Affiliation	Academia	1 Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 2 Dept. of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA.
Pseudocode	Yes	Algorithm 1 Active Reward Refinement
Open Source Code	Yes	1Project website: https://www.tshu.io/GEM
Open Datasets	No	We propose a one-shot imitation learning environment, Watch&Move. ... We design 9 object rearrangement tasks in the Watch&Move environment... Expert demonstrations were created with a planner introduced in (Netanyahu et al., 2021), with a length ranging from 8 to 35 steps. The paper does not provide concrete access information (link, DOI, repository, or formal citation for the dataset itself) for the Watch&Move tasks/demonstrations.
Dataset Splits	No	The paper describes "training sets" (SD, S+, S-) used in its active reward refinement, but these are dynamically collected during the learning process and not conventional static dataset splits (e.g., percentages or counts) for a predefined dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	We use Py Box2D to simulate the physical dynamics in the environment. ... For optimizing the network, we use Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.0003. ... We build upon an AIRL implementation (Wang et al., 2020)... The paper mentions software tools like Py Box2D, Adam optimizer, and an AIRL implementation but does not specify their version numbers.
Experiment Setup	Yes	M-AIRL is executed for 500k generator steps, the expert batch size is the length of the expert demonstrations. For the model-based policy, we set β = 0.3 in Eq. (4). The discriminator is updated for 4 steps after every model-based generator execution. ... We apply 5k network updates per query iteration. For optimizing the network, we use Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.0003. For each update, we sample a batch of 16 states for the regression loss and a batch of 16 pairs of positive and negative states for the reward ranking loss. ... we set λ = 0.2 in Eq 8.