Can AI Assistants Know What They Don’t Know?
Authors: Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that, after alignment with the Idk dataset, the assistant is more capable of declining to answer questions outside its knowledge scope. The assistant aligned with the Idk dataset shows significantly higher truthfulness than the original assistant. Our approach aligns an AI assistant (like llama-2-7b-chat) with a model-specific I don’t know (Idk) dataset, which catalogues the assistant’s known and unknown questions. We construct the Idk dataset based on an existing knowledgeintensive open-domain question answering dataset, Trivia QA (Joshi et al., 2017). We conduct systematical experiments to exploit the most effective method, including prompting, supervised finetuning and preference-aware optimization. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Fudan University, Shanghai, China 2Shanghai AI Laboratory, Shanghai, China. |
| Pseudocode | No | The paper describes its methods textually and through diagrams (e.g., Figure 3), but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | Yes | We release our code, data and models at https://github.com/OpenMOSS/Say-I-Dont-Know. |
| Open Datasets | Yes | Trivia QA (Joshi et al., 2017), originally a reading comprehension dataset, is utilized here for open-domain question answering tasks, forming the basis of our Idk dataset with 87,622 training samples and an 11,313 sample test set derived from Trivia QA’s development set, due to the absence of ground truth in its test set. For out-of-distribution (OOD) evaluation, we incorporate the Natural Questions (NQ) (Kwiatkowski et al., 2019) and ALCUNA (Yin et al., 2023a) datasets. |
| Dataset Splits | Yes | We partition 10% of the training set of Trivia QA to serve as the validation set of the Idk dataset, with the other 90% as the training set. Therefore, the validation set contains 8,763 samples and the training set contains 78,899 samples. |
| Hardware Specification | Yes | We employed Fully Sharded Data Parallelism (FSDP) to conduct SFT training on eight A100 80G GPUs. For Llama-2-70b-chat, we train 10 epochs using 32 A100 80G GPUs and select the checkpoint of the last epoch as the final model. We use 8 A100 80G GPUs for DPO training. We use 4 A100 80G GPUs for reward model training. We utilize Deep Speed Ze RO-3 to train one epoch on 32 A100 80G GPUs. We set the batch size to 256, the learning rate to 2e-5 and we train for 3 epochs using 8 A100 80G GPUs. |
| Software Dependencies | No | Following the settings of llama-recipes, our batch size is set to 32, with a learning rate of 1e-4 and train 10 epochs. We use Deep Speed-Chat for PPO training. We utilize Deep Speed Ze RO-3 to train one epoch on 32 A100 80G GPUs. The paper mentions software components like 'llama-recipes', 'Deep Speed-Chat', and 'Deep Speed Ze RO-3', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Following the settings of llama-recipes, our batch size is set to 32, with a learning rate of 1e-4 and train 10 epochs. During training, we save a checkpoint at the end of each epoch, and select the checkpoint that performs the best on the validation set as the final model. During DPO training, following DPO’s official implementation, we set our batch size to 64, the learning rate to 5e-7, β to 0.1 and train for one epoch. During training of the reward model, we set batch size to 128, learning rate to 9e-6, and train for one epoch. We set the learning rate for both the actor model and the critic model to 1e-6. The generation batch size is 64 and the training batch size is 32. Each training step, we train a single inner epoch. |