reproducibilityindex.ai

Learning Human Objectives by Evaluating Hypothetical Behavior

Authors: Siddharth Reddy, Anca Dragan, Sergey Levine, Shane Legg, Jan Leike

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Re Que ST with simulated users on a state-based 2D navigation task and the image-based Car Racing video game. The results show that Re Que ST signiﬁcantly outperforms prior methods in learning reward models that transfer to new environments with different initial state distributions.
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Deep Mind. Correspondence to: Siddharth Reddy <sgr@berkeley.edu>, Jan Leike <leike@google.com>.
Pseudocode	Yes	Algorithm 1 Reward Query Synthesis via Trajectory Optimization (Re Que ST)
Open Source Code	No	The paper does not contain an explicit statement about the release of its own source code, nor does it provide a direct link to a code repository for the described methodology.
Open Datasets	Yes	MNIST classiﬁcation... MNIST (Le Cun, 1998)... image-based Car Racing from the Open AI Gym (Brockman et al., 2016)
Dataset Splits	No	The paper describes training and test environments with different initial state distributions for MNIST, but it does not specify explicit numerical splits for a validation set (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not mention any specific hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions "Adam (Kingma & Ba, 2014)" as an optimizer and "Open AI Gym (Brockman et al., 2016)" as a platform, but it does not provide specific version numbers for any software libraries or dependencies used for the implementation.
Experiment Setup	No	While the paper describes the experimental domains and evaluation metrics, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations in the main text.