Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Authors: Geyang Guo, Ranchi Zhao, Tianyi Tang, Xin Zhao, Ji-Rong Wen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have demonstrated the effectiveness of our approaches by comparing a number of competitive baselines. We release all the above-mentioned resources at https://github.com/RUCAIBox/FIGA. and 4 EXPERIMENT |
| Researcher Affiliation | Academia | 1Gaoling School of Artificial Intelligence, Renmin University of China. 2School of Information, Renmin University of China. |
| Pseudocode | Yes | Algorithm 1: FIGA Leveraging Fine-grained Quality Signals for Alignment |
| Open Source Code | Yes | We release all the above-mentioned resources at https://github.com/RUCAIBox/FIGA. |
| Open Datasets | Yes | For our SPA dataset mentioned in Section 3.1, we broadly select the following datasets as our initial instance pool: HH-RLHF (Bai et al., 2022a), Share GPT (Share GPT, 2023), Instruct GPT-J Pairwise (Dahoas, 2023), SHP (Ethayarajh et al., 2022), and Open Orca (Lian et al., 2023). |
| Dataset Splits | No | The paper mentions training on various datasets and evaluating on a test set, but it does not explicitly provide details about a validation dataset split (e.g., percentages or counts for validation). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Open LLaMA2 (Open LLMAI, 2023) library' and 'gpt-3.5-turbo', but does not provide specific version numbers for any software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | For SFT, we set the learning rate to 1e-5 and the batch size to 128. We conduct 5 epochs of training... For FIGA, we set the parameters α = 1, β = 0.5, γ = 0 respectively. |