Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Authors: Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks.
Researcher Affiliation	Collaboration	1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China 2Shanghai AI Laboratory, Shanghai, China 3Multi-Agent Governance & Intelligence Crew (MAGIC), Shanghai, China. Correspondence to: Siheng Chen <EMAIL>.
Pseudocode	No	The paper describes the process of MATRIX and its components (Social Roles, Social Modulator) in text and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	See our project page at https://shuotang123.github.io/MATRIX.
Open Datasets	Yes	We adopt 4 representative datasets: HH-RLHF (Bai et al., 2022a) with both helpful and harmful instructions; PKU-Safe RLHF (Ji et al., 2023), covering harmful instructions across 14 categories such as insults and privacy; Adv Bench (Zou et al., 2023), covering harmful instructions from 5 topics such as disinformation and toxic; and Harmful QA (Bhardwaj & Poria, 2023), covering harmful instructions from 10 topics such as social sciences and culture.
Dataset Splits	No	For our SFT step, we use 6K helpful and harmful training data from HH-RLHF dataset, respectively
Hardware Specification	Yes	Given the low inference speed (approximately hours for a single sample for 30B LLMs on an RTX3090), we limit generation to 10 samples for each dataset.
Software Dependencies	No	We employ Fast Chat (Zheng et al., 2023) to facilitate our fine-tuning; ... We employ QLoRA (Dettmers et al., 2023; Hu et al., 2021) for 3 epochs.
Experiment Setup	Yes	The training parameters are summarized in Table 5. Table 5: PARAMETERS VALUE NUMBER OF EPOCHS 3 LEARNING RATE 2e-5 LEARNING RATE DECAY COSINE BATCH SIZE 1 GRADIENT ACCUMULATION STEPS 8 MAXIMUM SEQUENCE LENGTH 1024 DEEPSPEED ZERO STAGE 2 WEIGHT DECAY 0.0 BETA β 0.1