reproducibilityindex.ai

FLAME : Factuality-Aware Alignment for Large Language Models

Authors: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Scott Yih, Xilun Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our proposed FLAME guides LLMs to output more factual responses while maintaining their instruction-following capability.
Researcher Affiliation	Collaboration	University of Waterloo1, Carnegie Mellon University2, Meta AI3
Pseudocode	No	The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	While we do not provide the code to reproduce the main experimental results, we provide all the necessary information and URL links to training and evaluation data.
Open Datasets	Yes	At SFT stage, we fine-tune PT on two seed datasets: (1) Instruction-following training (IFT) data from Li et al. [2024], consisting of 3200 instruction response pairs created by humans from Open Assistant dataset [OASST; Köpf et al., 2023]; (2) evaluation following training (EFT) data from Yuan et al. [2024]
Dataset Splits	No	For the experiment, we compile training and evaluation datasets comprising 500 and 183 diverse human entities, respectively (further details provided in Appendix A.1). The paper explicitly mentions training and evaluation (test) sets for some experiments but does not explicitly define a separate validation set.
Hardware Specification	Yes	We conduct fine-tuning with full parameters on 64 NVIDIA A100 (80GB) GPUs.
Software Dependencies	No	The paper mentions several software components and models (e.g., Llama-2 70B, FACTSCORE, DRAGON+, nltk.tokenize) but does not provide specific version numbers for these software dependencies to ensure reproducibility.
Experiment Setup	Yes	We fine-tune our models for 500 steps with a batch size of 32 and 64 on respective SFT and DPO stages. The learning rate and maximum sequence length is set to 1e 6 (which decays to 1e 7) and 2048, respectively. At SFT stage, we mix the IFT and EFT while at DPO stage, we set β = 0.1 and uniformly sample between self rewarding (x, y+, y ) and factuality reward (x, ytrue, yfalse) preference data.