Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AI Alignment with Changing and Influenceable Reward Functions
Authors: Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contributions can be summarized as follows: 1. We provide the formal language of Dynamic Reward MDPs (DR-MDPs) for analyzing AI decisions and influence in settings with changing reward functions. 2. We show how existing AI alignment techniques may systematically incentivize questionable influence when used in dynamic-reward settings. 3. By comparing 8 natural notions of alignment, and showing that they all may either fail to avoid undesirable influence or are impractically risk-averse, we elucidate trade-offs that seem inherent to choosing any objective. |
| Researcher Affiliation | Academia | 1UC Berkeley. Correspondence to: EMAIL. |
| Pseudocode | Yes | Algorithm 1 Learning reward functions and their dynamics |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that code for the methodology is being released. |
| Open Datasets | No | The paper uses illustrative toy examples (e.g., Conspiracy Influence DR-MDP, Writerโs curse, Clickbait DR-MDP) for theoretical analysis, not publicly available datasets for empirical training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets, thus no dataset splits for training, validation, or testing are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers needed to replicate its work. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations. |