The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual Emotion Prompt can boost the performance of AI models while Emotion Attack can hinder it. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, Beijing, China 2CAS, Institute of Software, Beijing, China 3Department of Computer Science, William & Mary, Williamsburg, Virginia, America 4School of Psychology, Beijing Normal University, Beijing, China 5Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links about open-sourcing the code for its described methodology. |
| Open Datasets | Yes | Specifically, we adopted 50 tasks from two popular datasets, including Instruction Induction (Honovich et al., 2022) and BIG-Bench-Hard (Suzgun et al., 2022) to evaluate semantic understanding and logical reasoning abilities, leading to 940, 200 evaluations. |
| Dataset Splits | No | The paper mentions using specific datasets for evaluation but does not provide explicit details about training, validation, and test splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as specific GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of large language models and multi-modal models (e.g., GPT-4, Llama2, DALL-E) but does not list specific software dependencies, programming languages, or libraries with version numbers used for their own implementation. |
| Experiment Setup | Yes | For Chat GPT, we utilize gpt-3.5-turbo (0613) and set temperature parameter to 0.7. For GPT-4 and Llama 2, we set the temperature to 0.7. The remaining LLMs are evaluated using their default settings. |