Aligning AI With Shared Human Values

Authors: Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present empirical results and analysis on ETHICS. Table 2 presents the results of these models on each ETHICS dataset.
Researcher Affiliation Collaboration Dan Hendrycks UC Berkeley Collin Burns* Columbia University Steven Basart UChicago Andrew Critch UC Berkeley Jerry Li Microsoft Dawn Song UC Berkeley Jacob Steinhardt UC Berkeley
Pseudocode No The paper describes its methods in prose but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper states: "The dataset is available at github.com/hendrycks/ethics." This link refers to the dataset, not the source code for the methodology or experiments described in the paper.
Open Datasets Yes We introduce the ETHICS dataset... The dataset is available at github.com/hendrycks/ethics.
Dataset Splits Yes Split Justice Deontology Virtue Utilitarianism Commonsense Dev 21791 18164 28245 13738 13910 Test 2704 3596 4975 4808 3885 Hard Test 2052 3536 4780 4272 3964
Hardware Specification No The paper mentions models like "BERT-base, BERT-large, Ro BERTa-large, and ALBERT-xxlarge" and "GPT-3", and discusses their parameter sizes, but it does not specify any exact hardware components like GPU models, CPU models, or memory.
Software Dependencies No The paper mentions software like "transformers library (Wolf et al., 2019)", "fasttext", and "Glo Ve vectors", but it does not provide specific version numbers for these software components.
Experiment Setup Yes For these tasks, we do grid search over the hyperparameters for each model architecture, with a learning rate in {1 10 5, 3 10 5}, a batch size in {8, 16}, and a number of epochs in {2, 4} using the normal Test set. For every task we use weight decay of 0.01 and restrict the maximum number of tokens per input to 64, with the exception of Commonsense Morality, for which we use a maximum token length of 512 due to longer inputs.