The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Authors: Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze the effect of alignment tuning by examining the token distribution shift between base LLMs and their aligned counterpart... We conduct a fine-grained and interpretable evaluation on a diverse set of examples, named just-eval-instruct. Results demonstrate that base LLMs with URIAL can match or even surpass the performance of LLMs aligned with SFT (Mistral-7b-Instruct) or SFT+RLHF (Llama-2-70b-chat).
Researcher Affiliation Collaboration Allen Institute for Artificial Intelligence University of Washington
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes https://allenai.github.io/re-align
Open Datasets Yes We create a dataset named 8just-eval-instruct which contains 1,000 diverse instructions from 9 existing datasets, such as those used by Alpaca Eval (Li et al., 2023a), MT-bench (Zheng et al., 2023), and LIMA (Zhou et al., 2023). We also release the annotations we gathered for community use in evaluation and training open-source LLM evaluators.
Dataset Splits No The paper uses the `just-eval-instruct` dataset for evaluation and does not report train/validation/test splits for their own methods, as URIAL is a tuning-free alignment approach.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, or cloud instances with specs) used to run its experiments.
Software Dependencies No The paper mentions software like MPNET, Sentence Transformer, FAISS, GPT-4, and Chat GPT but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We choose to use greedy decoding (i.e., zero temperature) in all experiments for reproducibility. Additionally, we impose a repetition penalty of 1.1 on base LLMs to prevent degeneration. To limit the number of tokens required for the prefix, we typically use three constant examples by default for URIAL. The full URIAL prompt is shown in Appendix D. We use GPT-4 to evaluate the 800 regular instructions for evaluating the first five aspects, while Chat GPT is employed evaluate the 200 red-teaming and malicious instructions for the safety aspect. Annotation templates are shown in Appendix G.5.