Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Authors: Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Applying SELF-ALIGN to the LLa MA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning), Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings. We quantitatively evaluate Dromedary on benchmark datasets and also assess its qualitative performance on several datasets for demonstration purposes. |
| Researcher Affiliation | Collaboration | Zhiqing Sun1 Yikang Shen2 Qinhong Zhou3 Hongxin Zhang3 Zhenfang Chen2 David Cox2 Yiming Yang1 Chuang Gan2,3 1Language Technologies Institute, CMU 2MIT-IBM Watson AI Lab, IBM Research 3UMass Amherst |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have open-sourced the code, Lo RA weights of Dromedary, and our synthetic training data to encourage further research into aligning LLM-based AI agents with enhanced supervision efficiency, reduced biases, and improved controllability. https://github.com/IBM/Dromedary |
| Open Datasets | Yes | We quantitatively evaluate Dromedary on benchmark datasets... Truthful QA benchmark [22]... BIG-bench HHH Eval [39, 3]... Chiang et al. [8] introduced an evaluation framework leveraging GPT-4 [27] to automate the assessment of chatbot performance... We have open-sourced the code, Lo RA weights of Dromedary, and our synthetic training data to encourage further research... |
| Dataset Splits | No | The paper fine-tunes on an aggregated dataset but does not explicitly specify a validation split or its purpose for reproduction (e.g., hyperparameter tuning, early stopping). |
| Hardware Specification | No | We would also like to thank the computation support from Ai MOS, a server cluster for the IBM Research AI Hardware Center. This mentions a "server cluster" but lacks specific hardware details like GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions software components like 'Lo RA' and refers to 'huggingface/peft' and 'tloen/alpaca-lora' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Principle Engraving We fine-tune the base LLa MA-65b model [44] on our aggregated Self-Instruct and Topic-Guided Red-Teaming Self-Instruct dataset for 1 epoch. We only finetune the Lo Ra weights [17] in the multi-head attention modules. We use a batch size of 768, a maximal sequence length of 512, and a max learning rate of 4e 4. A 1-epoch (approximately 335 steps) training schedule is used, where the learning rate increases (i.e., warm-up) in the first 100 steps with a log curve, and decays linearly to zero in the rest of the training steps. |