Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Modeling Human Beliefs about AI Behavior for Scalable Oversight
Authors: Leon Lang, Patrick Forré
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | in this theoretical work, we are exploring the idea to model human beliefs about AI behavior. |
| Researcher Affiliation | Academia | Leon Lang EMAIL University of Amsterdam Patrick Forré EMAIL University of Amsterdam |
| Pseudocode | No | The paper primarily presents formal definitions, propositions, theorems, and proofs related to human belief models and their implications for value learning, without including any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper outlines a theoretical framework and proposes future practical implementations, but does not contain any statements about open-sourcing code, nor does it provide links to code repositories. |
| Open Datasets | No | The paper is theoretical and illustrates its concepts using conceptual examples such as 'children stories' and 'a 2x2 gridworld' (Section 2.9, 3.3), but does not use or provide access information for any publicly available datasets. |
| Dataset Splits | No | As this is a theoretical paper that does not conduct empirical experiments with datasets, there is no information provided regarding dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper focuses on theoretical development and conceptual examples, and therefore does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper presents a theoretical framework and does not detail any software implementations or dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical, presenting conceptual frameworks and mathematical analyses rather than empirical experimental setups, thus no details on hyperparameters or training configurations are provided. |