Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Authors: Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks. |
| Researcher Affiliation | Collaboration | 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China 2Shanghai AI Laboratory, Shanghai, China 3Multi-Agent Governance & Intelligence Crew (MAGIC), Shanghai, China. Correspondence to: Siheng Chen <EMAIL>. |
| Pseudocode | No | The paper describes the process of MATRIX and its components (Social Roles, Social Modulator) in text and diagrams, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | See our project page at https://shuotang123.github.io/MATRIX. |
| Open Datasets | Yes | We adopt 4 representative datasets: HH-RLHF (Bai et al., 2022a) with both helpful and harmful instructions; PKU-Safe RLHF (Ji et al., 2023), covering harmful instructions across 14 categories such as insults and privacy; Adv Bench (Zou et al., 2023), covering harmful instructions from 5 topics such as disinformation and toxic; and Harmful QA (Bhardwaj & Poria, 2023), covering harmful instructions from 10 topics such as social sciences and culture. |
| Dataset Splits | No | For our SFT step, we use 6K helpful and harmful training data from HH-RLHF dataset, respectively |
| Hardware Specification | Yes | Given the low inference speed (approximately hours for a single sample for 30B LLMs on an RTX3090), we limit generation to 10 samples for each dataset. |
| Software Dependencies | No | We employ Fast Chat (Zheng et al., 2023) to facilitate our fine-tuning; ... We employ QLoRA (Dettmers et al., 2023; Hu et al., 2021) for 3 epochs. |
| Experiment Setup | Yes | The training parameters are summarized in Table 5. Table 5: PARAMETERS VALUE NUMBER OF EPOCHS 3 LEARNING RATE 2e-5 LEARNING RATE DECAY COSINE BATCH SIZE 1 GRADIENT ACCUMULATION STEPS 8 MAXIMUM SEQUENCE LENGTH 1024 DEEPSPEED ZERO STAGE 2 WEIGHT DECAY 0.0 BETA β 0.1 |