Position: Intent-aligned AI Systems Must Optimize for Agency Preservation

Authors: Catalin Mitelut, Benjamin Smith, Peter Vamplew

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix E we provide simulations to show how elementary interactions with AI systems that do not penalize agency loss can result in decreasing agency or options of end users. ... We simulate an episode of 10,000 action selections by the human and compute the value that the observing AI agent would ascribe to each action at each time point using TDlearning (with learning rate of: 0.1; colored plot-lines in Fig 3). ... An average over ten independent episodes similarily yields an uneven recommendation distribution of 23%, 28%, 31% and 18% respectively for the actions25.
Researcher Affiliation Academia 1Forum Basiliense, University of Basel 2University of Oregon 3Federation University Australia.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Code will be provided following the blind-review process.
Open Datasets No The paper uses conceptual simulations to demonstrate arguments rather than publicly available datasets with concrete access information.
Dataset Splits No The paper uses conceptual simulations and describes 'episodes' but does not specify formal training, validation, or test dataset splits in the manner typically found in empirical ML papers for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for running simulations or experiments, only mentioning conceptual models.
Software Dependencies No The paper does not provide specific software dependencies or version numbers.
Experiment Setup Yes We simulate an episode of 10,000 action selections by the human and compute the value that the observing AI agent would ascribe to each action at each time point using TDlearning (with learning rate of: 0.1; colored plot-lines in Fig 3). ... We simulated such a paradigm using a hard-boundary for value depletion (e.g. preventing AI systems from nudging or decreasing the value of an option beyond a certain limit, here 0.9 x initial value) and show the somewhat trivial result that both action selection and valuation are better preserved in such scenarios (Fig 6b,c).