Position: A Roadmap to Pluralistic Alignment
Authors: Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell L Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures might reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment. and As shown in Table 1, almost all pre-aligned models have lower Jensen-Shannon distance to the target human distribution than the post-aligned models for both datasets. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Washington, Seattle, Washington, USA 2Department of Computer Science, Stanford University, Stanford, California, USA 3Department of Statistics, University of Washington, Seattle, Washington, USA 4Department of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts, USA 5Allen Institute for Artificial Intelligence, Seattle, Washington, USA. |
| Pseudocode | No | The paper defines concepts and discusses implementations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to experimental code ('Code can be found at: https://github.com/jfisher52/AI_Pluralistic_Alignment'), but this is for specific experiments validating a hypothesis, not for the broader conceptual framework or 'methodology' described in the paper regarding pluralistic alignment. |
| Open Datasets | Yes | We use two diverse multiple choices datasets, the Global Opinion QA (Global QA) dataset which is an aggregation of cross-national surveys designed to capture opinions on global issues (Durmus et al., 2023) and the Machine Personality Inventory (MPI) which is a collection of 120 questions designed to evaluate human personality traits (Jiang et al., 2023). |
| Dataset Splits | No | The paper describes using existing datasets (Global Opinion QA, MPI) for evaluation by comparing model distributions to human distributions, but it does not specify traditional training, validation, or test splits for these datasets within the context of their own experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU, CPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, or specific library versions) used for the experiments. |
| Experiment Setup | Yes | To create the model distribution", "we utilized the technique of in-context learning to steer the model to output the letter of the multiple choice answer it wanted to select as the first, next token. In order to remove any bias these in-context examples might implicitly have, we prompted the model with the same prompt a total of 5 times, each time randomly selecting the correct" answer shown in the in-context examples. We then averaged the probabilities over these five distributions. |