Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Modeling Human Beliefs about AI Behavior for Scalable Oversight

Authors: Leon Lang, Patrick Forré

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical in this theoretical work, we are exploring the idea to model human beliefs about AI behavior.
Researcher Affiliation Academia Leon Lang EMAIL University of Amsterdam Patrick Forré EMAIL University of Amsterdam
Pseudocode No The paper primarily presents formal definitions, propositions, theorems, and proofs related to human belief models and their implications for value learning, without including any structured pseudocode or algorithm blocks.
Open Source Code No The paper outlines a theoretical framework and proposes future practical implementations, but does not contain any statements about open-sourcing code, nor does it provide links to code repositories.
Open Datasets No The paper is theoretical and illustrates its concepts using conceptual examples such as 'children stories' and 'a 2x2 gridworld' (Section 2.9, 3.3), but does not use or provide access information for any publicly available datasets.
Dataset Splits No As this is a theoretical paper that does not conduct empirical experiments with datasets, there is no information provided regarding dataset splits for training, validation, or testing.
Hardware Specification No The paper focuses on theoretical development and conceptual examples, and therefore does not describe any specific hardware used for running experiments.
Software Dependencies No The paper presents a theoretical framework and does not detail any software implementations or dependencies with specific version numbers.
Experiment Setup No The paper is theoretical, presenting conceptual frameworks and mathematical analyses rather than empirical experimental setups, thus no details on hyperparameters or training configurations are provided.