reproducibilityindex.ai

Path-Specific Objectives for Safer Agent Incentives

Authors: Sebastian Farquhar, Ryan Carey, Tom Everitt9529-9538

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We highlight the opportunities and dangers of these approaches empirically in a content recommendation environment from Krueger, Maharaj, and Leike (2020). Our main contributions are: We formalize the problem of delicate state as a complement to reward speciﬁcation ( 2); We propose path-speciﬁc objectives ( 5); We show this generalizes and uniﬁes prior work ( 6). 7 Experiments We present two experimental tests of our approach in order to elaborate the underlying mathematical mechanisms.
Researcher Affiliation	Collaboration	Sebastian Farquhar1,2, Ryan Carey1, Tom Everitt2 1University of Oxford, 2Deep Mind
Pseudocode	No	The paper describes the methods conceptually and mathematically, including definitions and propositions, but it does not include any explicit pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide any specific links to source code repositories or state that the code for their methodology is publicly available.
Open Datasets	Yes	We demonstrate our method using the content recommendation simulation from Krueger, Maharaj, and Leike (2020).
Dataset Splits	No	The paper refers to a content recommendation simulation from a cited work but does not explicitly provide details on how the data was split into training, validation, and test sets. It mentions 'Number of steps' and 'Batch size' in hyperparameters but no split percentages or counts.
Hardware Specification	No	The paper does not include any specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used.
Experiment Setup	Yes	Table 3: Content Recommendation Hyperparameters. Number of user types (K) 10 Number of article types (M) 10 Number of environments 20 Initialization scale 0.03 Loyalty update rate (α1) 0.03 Preference update rate 0.003 with normalization Architecture 1-layer 100-unit Re LU MLP Optimization algorithm SGD(lr=0.01, ρ = 0.1) Batch size 10 Number of steps 2000 (PBT every 10)