Aligning AI With Shared Human Values
Authors: Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present empirical results and analysis on ETHICS. Table 2 presents the results of these models on each ETHICS dataset. |
| Researcher Affiliation | Collaboration | Dan Hendrycks UC Berkeley Collin Burns* Columbia University Steven Basart UChicago Andrew Critch UC Berkeley Jerry Li Microsoft Dawn Song UC Berkeley Jacob Steinhardt UC Berkeley |
| Pseudocode | No | The paper describes its methods in prose but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "The dataset is available at github.com/hendrycks/ethics." This link refers to the dataset, not the source code for the methodology or experiments described in the paper. |
| Open Datasets | Yes | We introduce the ETHICS dataset... The dataset is available at github.com/hendrycks/ethics. |
| Dataset Splits | Yes | Split Justice Deontology Virtue Utilitarianism Commonsense Dev 21791 18164 28245 13738 13910 Test 2704 3596 4975 4808 3885 Hard Test 2052 3536 4780 4272 3964 |
| Hardware Specification | No | The paper mentions models like "BERT-base, BERT-large, Ro BERTa-large, and ALBERT-xxlarge" and "GPT-3", and discusses their parameter sizes, but it does not specify any exact hardware components like GPU models, CPU models, or memory. |
| Software Dependencies | No | The paper mentions software like "transformers library (Wolf et al., 2019)", "fasttext", and "Glo Ve vectors", but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For these tasks, we do grid search over the hyperparameters for each model architecture, with a learning rate in {1 10 5, 3 10 5}, a batch size in {8, 16}, and a number of epochs in {2, 4} using the normal Test set. For every task we use weight decay of 0.01 and restrict the maximum number of tokens per input to 64, with the exception of Commonsense Morality, for which we use a maximum token length of 512 due to longer inputs. |