Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Authors: Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks.
Researcher Affiliation Collaboration 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China 2Shanghai AI Laboratory, Shanghai, China 3Multi-Agent Governance & Intelligence Crew (MAGIC), Shanghai, China. Correspondence to: Siheng Chen <sihengc@sjtu.edu.cn>.
Pseudocode No The paper describes the process of MATRIX and its components (Social Roles, Social Modulator) in text and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes See our project page at https://shuotang123.github.io/MATRIX.
Open Datasets Yes We adopt 4 representative datasets: HH-RLHF (Bai et al., 2022a) with both helpful and harmful instructions; PKU-Safe RLHF (Ji et al., 2023), covering harmful instructions across 14 categories such as insults and privacy; Adv Bench (Zou et al., 2023), covering harmful instructions from 5 topics such as disinformation and toxic; and Harmful QA (Bhardwaj & Poria, 2023), covering harmful instructions from 10 topics such as social sciences and culture.
Dataset Splits No For our SFT step, we use 6K helpful and harmful training data from HH-RLHF dataset, respectively
Hardware Specification Yes Given the low inference speed (approximately hours for a single sample for 30B LLMs on an RTX3090), we limit generation to 10 samples for each dataset.
Software Dependencies No We employ Fast Chat (Zheng et al., 2023) to facilitate our fine-tuning; ... We employ QLoRA (Dettmers et al., 2023; Hu et al., 2021) for 3 epochs.
Experiment Setup Yes The training parameters are summarized in Table 5. Table 5: PARAMETERS VALUE NUMBER OF EPOCHS 3 LEARNING RATE 2e-5 LEARNING RATE DECAY COSINE BATCH SIZE 1 GRADIENT ACCUMULATION STEPS 8 MAXIMUM SEQUENCE LENGTH 1024 DEEPSPEED ZERO STAGE 2 WEIGHT DECAY 0.0 BETA β 0.1