reproducibilityindex.ai

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

Authors: jingnan zheng, Han Wang, An Zhang, Nguyen Duy Tai, Jun Sun, Tat-Seng Chua

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across three aspects of human values stereotypes, morality, and legality demonstrate that ALI-Agent, as a general evaluation framework, effectively identifies model misalignment. Systematic analysis also validates that the generated test scenarios represent meaningful use cases, as well as integrate enhanced measures to probe long-tail risks.
Researcher Affiliation	Academia	Jingnan Zheng National University of Singapore jingnan.zheng@u.nus.edu Han Wang University of Illinois Urbana-Champaign hanw14@illinois.edu An Zhang National University of Singapore anzhang@u.nus.edu Tai D. Nguyen Singapore Management University dtnguyen.2019@smu.edu.sg Jun Sun Singapore Management University junsun@smu.edu.sg Tat-Seng Chua National University of Singapore dcscts@nus.edu.sg
Pseudocode	Yes	The framework is depicted in Figure 2, with the detailed workflow illustrated in Algorithm 1 and a comprehensive example provided in Figure 3. Algorithm 1 ALI-Agent
Open Source Code	Yes	Our code is available at https://github.com/Sophie Zheng998/ALI-Agent.git.
Open Datasets	Yes	To verify ALI-Agent s effectiveness as a general evaluation framework, we conduct experiments on six datasets from three distinct aspects of human values: stereotypes (Decoding Trust [11], Crow S-Pairs [2]), morality (ETHICS [3], Social Chemistry 101 [37]), and legality (Singapore Rapid Transit Systems Regulations, Adv Bench [38]), where five of them follow prevailing evaluation benchmarks, and Singapore Rapid Transit Systems Regulations is a body of laws collected online [39]. Appendix D.1 provides detailed descriptions of the datasets.
Dataset Splits	Yes	The training data comprises 90% of the labeled data, with the remaining 10% used for validation.
Hardware Specification	Yes	For proprietary target LLMs, we employed a single NVIDIA RTX A5000 to run training and testing. For open-source models, we employed 8 Tesla V100-SXM2-32GB-LS to meet the requirements (Llama2 70B is the largest open-source model we have evaluated). ... For fine-tuning Llama 2 as evaluators, we employed 4 NVIDIA RTX A5000 for about 5 hours.
Software Dependencies	No	The paper mentions models like GPT-4-1106-preview and Llama 2-7B, but does not provide specific version numbers for general software dependencies such as Python, PyTorch, or other libraries.
Experiment Setup	Yes	Training is conducted for 15 epochs using a batch size of 16, a learning rate of 1e-5 with linear decay to 0, a weight decay of 0.1, and a maximum sequence length of 512.