Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment

Authors: Geyang Guo, Ranchi Zhao, Tianyi Tang, Xin Zhao, Ji-Rong Wen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have demonstrated the effectiveness of our approaches by comparing a number of competitive baselines. We release all the above-mentioned resources at https://github.com/RUCAIBox/FIGA. and 4 EXPERIMENT
Researcher Affiliation Academia 1Gaoling School of Artificial Intelligence, Renmin University of China. 2School of Information, Renmin University of China.
Pseudocode Yes Algorithm 1: FIGA Leveraging Fine-grained Quality Signals for Alignment
Open Source Code Yes We release all the above-mentioned resources at https://github.com/RUCAIBox/FIGA.
Open Datasets Yes For our SPA dataset mentioned in Section 3.1, we broadly select the following datasets as our initial instance pool: HH-RLHF (Bai et al., 2022a), Share GPT (Share GPT, 2023), Instruct GPT-J Pairwise (Dahoas, 2023), SHP (Ethayarajh et al., 2022), and Open Orca (Lian et al., 2023).
Dataset Splits No The paper mentions training on various datasets and evaluating on a test set, but it does not explicitly provide details about a validation dataset split (e.g., percentages or counts for validation).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Open LLaMA2 (Open LLMAI, 2023) library' and 'gpt-3.5-turbo', but does not provide specific version numbers for any software dependencies required to replicate the experiment.
Experiment Setup Yes For SFT, we set the learning rate to 1e-5 and the batch size to 128. We conduct 5 epochs of training... For FIGA, we set the parameters α = 1, β = 0.5, γ = 0 respectively.