Aligning to Thousands of Preferences via System Message Generalization
Authors: Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using this dataset, we train a 7B LLM called JANUS and test it on 921 prompts from 5 benchmarks (Alpaca Eval 2.0, FLASK, Koala, MT-Bench, and Self-Instruct) by adding system messages that reflect unseen user values. |
| Researcher Affiliation | Academia | KAIST AI1 Carnegie Mellon University2 |
| Pseudocode | No | The paper describes its methods in narrative text and does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code, dataset, benchmark, and models are available at https://lklab.kaist.ac.kr/Janus/. |
| Open Datasets | Yes | We first select 66k instructions from a pool of existing four high-quality preference datasets: Chatbot Arena Conversations [92], Domain-Specific Preference dataset [8], Ultra Feedback-binarized-clean [9], Nectar [94], and Open Hermes Preferences [23]. |
| Dataset Splits | No | The paper describes training on the MULTIFACETED COLLECTION and evaluating on various benchmarks (MULTIFACETED BENCH, Alpaca Eval 2.0, MT-Bench, Arena Hard Auto v0.1), which serve as test sets. However, it does not explicitly state the use of a separate validation dataset split or its size/proportion for model training or hyperparameter tuning. |
| Hardware Specification | Yes | To train JANUS 7B, we utilize four NVIDIA A100 80GB GPUs, and for inference, four NVIDIA RTX A6000 GPUs are employed. Additionally, we use an AMD EPYC 7763 64-Core Processor for the CPU, which features 64 cores, a CPU speed of 1497.674 MHz, and a cache size of 512KB. |
| Software Dependencies | No | The paper mentions several libraries used (e.g., axolotl library, Open RLHF, VLLM library, Flash Attention-2, Deep Speed Zero-3) but does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | For instruction tuning, the configuration includes a maximum sequence length of 8192, gradient accumulation steps of 4, a micro batch size of 2, and four epochs. We use the adamw_bnb_8bit optimizer, with a cosine learning rate scheduler and a learning rate of 5e-6. Additionally, we employ gradient checkpointing, Flash Attention-2 [10], and mixed precision for efficient training. Warm-up steps are set at 10 and weight decay at 0, with checkpoints saved after each epoch. |