Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Authors: Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Applying SELF-ALIGN to the LLa MA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning), Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings. We quantitatively evaluate Dromedary on benchmark datasets and also assess its qualitative performance on several datasets for demonstration purposes.
Researcher Affiliation Collaboration Zhiqing Sun1 Yikang Shen2 Qinhong Zhou3 Hongxin Zhang3 Zhenfang Chen2 David Cox2 Yiming Yang1 Chuang Gan2,3 1Language Technologies Institute, CMU 2MIT-IBM Watson AI Lab, IBM Research 3UMass Amherst
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We have open-sourced the code, Lo RA weights of Dromedary, and our synthetic training data to encourage further research into aligning LLM-based AI agents with enhanced supervision efficiency, reduced biases, and improved controllability. https://github.com/IBM/Dromedary
Open Datasets Yes We quantitatively evaluate Dromedary on benchmark datasets... Truthful QA benchmark [22]... BIG-bench HHH Eval [39, 3]... Chiang et al. [8] introduced an evaluation framework leveraging GPT-4 [27] to automate the assessment of chatbot performance... We have open-sourced the code, Lo RA weights of Dromedary, and our synthetic training data to encourage further research...
Dataset Splits No The paper fine-tunes on an aggregated dataset but does not explicitly specify a validation split or its purpose for reproduction (e.g., hyperparameter tuning, early stopping).
Hardware Specification No We would also like to thank the computation support from Ai MOS, a server cluster for the IBM Research AI Hardware Center. This mentions a "server cluster" but lacks specific hardware details like GPU/CPU models or memory used for experiments.
Software Dependencies No The paper mentions software components like 'Lo RA' and refers to 'huggingface/peft' and 'tloen/alpaca-lora' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes Principle Engraving We fine-tune the base LLa MA-65b model [44] on our aggregated Self-Instruct and Topic-Guided Red-Teaming Self-Instruct dataset for 1 epoch. We only finetune the Lo Ra weights [17] in the multi-head attention modules. We use a batch size of 768, a maximal sequence length of 512, and a max learning rate of 4e 4. A 1-epoch (approximately 335 steps) training schedule is used, where the learning rate increases (i.e., warm-up) in the first 100 steps with a log curve, and decays linearly to zero in the rest of the training steps.