Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning Human Objectives by Evaluating Hypothetical Behavior
Authors: Siddharth Reddy, Anca Dragan, Sergey Levine, Shane Legg, Jan Leike
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Re Que ST with simulated users on a state-based 2D navigation task and the image-based Car Racing video game. The results show that Re Que ST significantly outperforms prior methods in learning reward models that transfer to new environments with different initial state distributions. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Deep Mind. Correspondence to: Siddharth Reddy <EMAIL>, Jan Leike <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Reward Query Synthesis via Trajectory Optimization (Re Que ST) |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its own source code, nor does it provide a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | MNIST classification... MNIST (Le Cun, 1998)... image-based Car Racing from the Open AI Gym (Brockman et al., 2016) |
| Dataset Splits | No | The paper describes training and test environments with different initial state distributions for MNIST, but it does not specify explicit numerical splits for a validation set (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not mention any specific hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions "Adam (Kingma & Ba, 2014)" as an optimizer and "Open AI Gym (Brockman et al., 2016)" as a platform, but it does not provide specific version numbers for any software libraries or dependencies used for the implementation. |
| Experiment Setup | No | While the paper describes the experimental domains and evaluation metrics, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations in the main text. |