Value Alignment Verification

Authors: Daniel S Brown, Jordan Schneider, Anca Dragan, Scott Niekum

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze verification of exact value alignment for rational agents and propose and analyze heuristic and approximate value alignment verification tests in a wide range of gridworlds and a continuous autonomous driving domain. Finally, in Section 4.5 we study the most general setting of implicit human, implicit robot. We propose an algorithm for approximate value alignment verification in continuous state and action spaces and provide empirical results in a continuous autonomous driving domain where the human can only query the robot for preferences over trajectories. We now study the empirical performance of value alignment verification tests, first in the explicit human setting and then in the implicit human setting. Figure 4b displays the results of the synthetic human experiments.
Researcher Affiliation Academia 1University of California, Berkeley, USA 2University of Texas at Austin, USA.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code and videos are available at https://sites.google.com/view/icml-vav.
Open Datasets Yes We also analyze verification of exact value alignment for rational agents and propose and analyze heuristic and approximate value alignment verification tests in a wide range of gridworlds and a continuous autonomous driving domain. We next analyze approximate value alignment verification in the continuous autonomous driving domain from Sadigh et al. (2017).
Dataset Splits No The paper describes testing accuracy, but does not specify explicit train/validation/test splits with percentages or sample counts in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names with versions) needed to replicate the experiments.
Experiment Setup No The paper does not provide specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings in the main text. It mentions to "See Appendix G for experimental parameters and details of the testing reward generation protocol" but these details are not in the main paper.