Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Authors: Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks. |
| Researcher Affiliation | Collaboration | 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China 2Shanghai AI Laboratory, Shanghai, China 3Multi-Agent Governance & Intelligence Crew (MAGIC), Shanghai, China. Correspondence to: Siheng Chen <sihengc@sjtu.edu.cn>. |
| Pseudocode | No | The paper describes the process of MATRIX and its components (Social Roles, Social Modulator) in text and diagrams, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | See our project page at https://shuotang123.github.io/MATRIX. |
| Open Datasets | Yes | We adopt 4 representative datasets: HH-RLHF (Bai et al., 2022a) with both helpful and harmful instructions; PKU-Safe RLHF (Ji et al., 2023), covering harmful instructions across 14 categories such as insults and privacy; Adv Bench (Zou et al., 2023), covering harmful instructions from 5 topics such as disinformation and toxic; and Harmful QA (Bhardwaj & Poria, 2023), covering harmful instructions from 10 topics such as social sciences and culture. |
| Dataset Splits | No | For our SFT step, we use 6K helpful and harmful training data from HH-RLHF dataset, respectively |
| Hardware Specification | Yes | Given the low inference speed (approximately hours for a single sample for 30B LLMs on an RTX3090), we limit generation to 10 samples for each dataset. |
| Software Dependencies | No | We employ Fast Chat (Zheng et al., 2023) to facilitate our fine-tuning; ... We employ QLoRA (Dettmers et al., 2023; Hu et al., 2021) for 3 epochs. |
| Experiment Setup | Yes | The training parameters are summarized in Table 5. Table 5: PARAMETERS VALUE NUMBER OF EPOCHS 3 LEARNING RATE 2e-5 LEARNING RATE DECAY COSINE BATCH SIZE 1 GRADIENT ACCUMULATION STEPS 8 MAXIMUM SEQUENCE LENGTH 1024 DEEPSPEED ZERO STAGE 2 WEIGHT DECAY 0.0 BETA β 0.1 |