reproducibilityindex.ai

Aligning to Thousands of Preferences via System Message Generalization

Authors: Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using this dataset, we train a 7B LLM called JANUS and test it on 921 prompts from 5 benchmarks (Alpaca Eval 2.0, FLASK, Koala, MT-Bench, and Self-Instruct) by adding system messages that reflect unseen user values.
Researcher Affiliation	Academia	KAIST AI1 Carnegie Mellon University2
Pseudocode	No	The paper describes its methods in narrative text and does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code, dataset, benchmark, and models are available at https://lklab.kaist.ac.kr/Janus/.
Open Datasets	Yes	We first select 66k instructions from a pool of existing four high-quality preference datasets: Chatbot Arena Conversations [92], Domain-Specific Preference dataset [8], Ultra Feedback-binarized-clean [9], Nectar [94], and Open Hermes Preferences [23].
Dataset Splits	No	The paper describes training on the MULTIFACETED COLLECTION and evaluating on various benchmarks (MULTIFACETED BENCH, Alpaca Eval 2.0, MT-Bench, Arena Hard Auto v0.1), which serve as test sets. However, it does not explicitly state the use of a separate validation dataset split or its size/proportion for model training or hyperparameter tuning.
Hardware Specification	Yes	To train JANUS 7B, we utilize four NVIDIA A100 80GB GPUs, and for inference, four NVIDIA RTX A6000 GPUs are employed. Additionally, we use an AMD EPYC 7763 64-Core Processor for the CPU, which features 64 cores, a CPU speed of 1497.674 MHz, and a cache size of 512KB.
Software Dependencies	No	The paper mentions several libraries used (e.g., axolotl library, Open RLHF, VLLM library, Flash Attention-2, Deep Speed Zero-3) but does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions.
Experiment Setup	Yes	For instruction tuning, the configuration includes a maximum sequence length of 8192, gradient accumulation steps of 4, a micro batch size of 2, and four epochs. We use the adamw_bnb_8bit optimizer, with a cosine learning rate scheduler and a learning rate of 5e-6. Additionally, we employ gradient checkpointing, Flash Attention-2 [10], and mixed precision for efficient training. Warm-up steps are set at 10 and weight decay at 0, with checkpoints saved after each epoch.